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1 Introduction 


The organization of the lexicon, and especially the relations between groups of 
lexemes, is a strongly debated topic in linguistics. Some authors have insisted on 
the lack of any structure in the lexicon. In this vein, Di Sciullo & Williams (1987: 
3) claim that “[t]he lexicon is like a prison - it contains only the lawless, and the 
only thing that its inmates have in common is lawlessness.’ In the alternative 
view, the lexicon is assumed to have a rich structure that captures all regularities 
and partial regularities that exist between lexical entries. 

Two very different schools of linguistics have insisted on the organization of 
the lexicon. On the one hand, for theories like HPs (Head-driven Phrase Struc- 
ture Grammar) (Pollard & Sag 1994), but also some versions of construction gram- 
mar (Fillmore & Kay 1995), the lexicon is assumed to have a very rich structure 
which captures common grammatical properties between its members. In this 
approach, a type hierarchy organizes the lexicon according to common proper- 
ties between items. For example Koenig (1999: 4, among others), working from 
an HPSG perspective, claims that the lexicon "provides a unified model for partial 
regularties, medium-size generalizations, and truly productive processes.” 

On the other hand, from the perspective of usage-based linguistics, several au- 
thors have drawn attention to the fact that lexemes which share morphological 
or syntactic properties tend to be organized in clusters of surface (phonological 
or semantic) similarity (Bybee & Slobin 1982; Eddington 1996; Skousen 1989). This 
approach, often called analogical, has developed highly accurate computational 
and non-computational models that can predict the classes to which lexemes be- 
long. Like the organization of lexemes in type hierarchies, analogical relations 
between items help speakers to make sense of intricate systems and reduce ap- 
parent complexity (Kópcke & Zubin 1984). 

Despite this core commonality, and despite the fact that most linguists seem to 
agree that analogy plays an important role in language, there has been remark- 
ably little work on bringing together these two approaches. Formal grammar tra- 
ditions have been very successful in capturing grammatical behaviour but, in the 
process, have downplayed the role analogy plays in linguistics (Anderson 2015). 
In this work, I aim to change this state of affairs. First, by providing an explicit 
formalization of how analogy interacts with grammar, and second, by showing 
that analogical effects and relations closely mirror the structures in the lexicon. 
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I will show that both formal grammar approaches and usage-based analogical 
models capture mutually compatible relations in the lexicon. 

This book is divided into two parts. Part I consists of two chapters. Chapter 2 
presents a summary of the most relevant work on analogy and delimits the exact 
kind of analogy I will focus on in the rest of the book. Because of its longstanding 
tradition in linguistics, there are various definitions and uses of analogy, not all 
of which are relevant to the present investigation. Chapter 3 presents the basic 
tools for integrating analogy into grammar and introduces the main system and 
its predictions. This chapter contains the main theoretical claim put forward in 
this book, namely that analogy is intrinsically linked to type hierarchies in the 
lexicon. 

Part II is divided into six chapters, containing nine case studies. Chapter 4 in- 
troduces the neural networks used for modelling analogy and discusses the basic 
tools for evaluating model performance (kappa scores and accuracy). Chapter 5 
presents two case studies on the gender-inflection class interaction in Latin and 
Romanian. In these examples I show how the correlations and discrepancies be- 
tween gender and inflection class in nouns can be modelled using multiple inheri- 
tance hierarchies, and how the shapes of these hierarchies are clearly reflected in 
the analogical relations. Chapter 6 discusses the effects of hybrid types in mor- 
phological phenomena in Russian and Croatian. These two languages present 
cases where for a single morphological property, the grammar offers two mu- 
tually exclusive, competing alternatives. In Russian, I show an example from 
derivational doubletism in the diminutive system, and in Croatian I present an 
overabundance example from the instrumental singular. Chapter 7 explores sys- 
tems where the morphological process clearly has an effect on the features anal- 
ogy operates on. The use of prefixes for inflection in Swahili and Otomi cause 
the analogical relations to take place mostly at the beginning of the stems. In 
Hausa, due to the use of broken plurals, the analogical models require a much 
more structural representation. Finally, Chapter 8 deals with two systems that 
show high complexity and a large number of inflection classes: Spanish verb in- 
flection, and Kasem plural and singular markers. In both Spanish and Kasem, 
the inflection class system requires multiple inflectional dimensions that oper- 
ate independently from each other, but interact to produce the inflection classes 
of verbs (Spanish) and nouns (Kasem). In both of these examples we see clear 
reflexes of the multiple dimensions of inflection in the analogical relations. 

The two most important chapters are Chapters 3 and 8. The chapters in Part II 
stand on their own and are mostly self contained. The empirical results reported 
in these chapters stand independently of the theory of this book. 
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Analogy can be defined in many ways, and it can be ascribed to various kinds of 
processes. The literature on analogy is vast and covers all sorts of phenomena and 
domains. Most work on it focuses on phenomena that are not directly relevant to 
the overall question of this book, but which are related in some way or another. 
In linguistics, the term analogy is usually employed whenever a process makes 
reference to direct comparison of surface items without making use of general 
rules, or when phonological or semantic similarities are involved, which are not 
easily captured as categorical generalizations. However, as a concept, analogy is 
rather fuzzy, and has no precise or unique definition. In the following subsections, 
I briefly mention some of the different phenomena for which the term analogy 
has been used, and in the final section of this chapter I focus on the actual kind 
of systems I will address in the present book. 

Making justice to the history of analogy in linguistics would require a book 
(or several) of its own. Extensive discussions of the development of analogy as 
a concept in linguistics can be found in Anttila (1977), Rainer (2013) and, most 
extensively, Itkonen (2005). 
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2.11 Single case analogy 


The simplest form of analogy is a similarity relation between two single items 
that plays a certain role in triggering or blocking a phonological or morpholog- 
ical process. An example of this type of analogy has been proposed to explain 
unpredictable new coinages and neologisms that make use of unproductive mor- 
phemes or non-morphemes (Motsch 1977: 195, see also Butterworth 1983). In such 
cases, a newly coined form does not make use of any derivational morphologi- 
cal process but is directly built on the basis of some existing form instead. Booij 
(2010: 89) cites the examples in (1): 


(1) a. angst-haas —  paniek-haas 
fear-hare panic-hare 
‘terrified person’ — ‘panicky person’ 
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b. moeder-taal —  vader-taal 
mother-language father-language 
‘native language’ — ‘father’s native language’ 
c. hand-vaardig —  muis-vaardig 
hand-able mouse-able 


‘with manual skills’ — ‘with mouse-handling skills’ 


In these three cases, the item haas ‘hare’, taal ‘language’ and vaardig ‘able’ are 
not derivational morphemes and cannot productively be used in other combina- 
tions. These are direct analogical formations because the new coinage is built 
from an existing compound. Various examples that follow similar processes can 
be found in other languages as well as can be seen in (2)-(4): 


(2) German 
Früh-stück —  Spat-stiick 
early-piece late-piece 


‘breakfast’ — ‘late breakfast’ 
(3) English 
handicaped & capable — handicapable 
(4) Spanish 
perfumería + super — superfumeria 
perfume store very — ‘large perfume store’ 


These are single case analogies because they are single formations based on 
the similarity to one or two words and not assumed to be a systematic (and pre- 
dictable) mechanism of the language. This kind of process is not predictably pro- 
ductive, and there are no generalizations about when or where it can apply, but 
the process seems to be constantly available to speakers. 

Within the rubric of single case analogies, there are multiple kinds of pro- 
cesses (Anderson 2015: 278). Some of these are: blending, where two words are 
joint together to form a a new word breakfast + lunch — brunch (also the ex- 
amples in (4)); back formation, where a new base is created for what appears 
to be a derived form, like the creation of the verb edit from the older noun edi- 
tor (compare however van Marle 1985 and Becker 1993); folks etymology, where 
speakers infer the wrong etymology of a word based on analogy to another word. 
One such example is the word vagabundo ‘homeless person’ in Spanish which is 
often thought to come from vagar ‘walk aimlessly’ and mundo ‘world’ and has 
lead people to think it should be vagamundo; affix-based analogy (Kilani-Schoch 
& Dressler 2005), where an apparent base-affix is extended to new contexts like 
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in the French aterrir ‘to land’, from terre ‘earth’ — amerrir ‘to land on the sea’, 
from mer ‘sea’ — alunir ‘to land on the moon’, from lune ‘moon’.! Although 
there are clear differences between these processes, these cases of analogy are 
all based on individual specific items and do not really involve abstraction across 
categories. 

In language change we also find examples of single case analogies, where the 
existence of a form prevents another form from following its expected path or, 
occasionally, leads to unexpected change (Bauer 2003). Anderson (2015: 276) de- 
scribes this kind of phenomenon as: “where the regular continuation of some 
form would be expected to undergo some re-shaping by sound change, but in- 
stead it is found to have been re-made to conform to some structural pattern. 
This is what we usually mean by “Analogy” 
the history of Spanish. A regular vowel change that happened between Latin and 
Spanish is the lowering of /i/ to /e/. Some examples of this change can be seen 
in (5): 


. Rainer (2013) cites an example from 


(5) a. pilum — pelo ‘hair’ 


b. istum — esto ‘this’ 


According to this phonological rule, from the lat. sinistrum ‘left’ the expected 
Spanish form would be sinestro, but because of analogy with the existing Spanish 
form diestro ‘right (handed)’, it became siniestro ‘sinister’. This is a single case 
analogical process at work. Because of semantic and phonological similarities to 
an existing word, some word fails to undergo a regular phonological change. 

A related phenomenon is called contamination (Paul 1880: 160), which hap- 
pens when two elements are so semantically similar that a new element with 
properties of both is created by speakers. As an example Paul mentions the Ger- 
man formation Erdtoffel ‘potato’ made out of Kartoffel and Erdapple (both also 
meaning ‘potato’), and Gemáldnis ‘painting’ formed from Bildnis ‘portrait’ and 
Gemálde ‘painting’. Some of these innovations are sporadic, but some can remain 
in the language. 

Although most studies have almost exclusively focused on morphological and 
phonological phenomena, there has been some recent work on syntactic analog- 
ical change (De Smet & Fischer 2017). In syntax the idea is the same; a given 
syntactic construction changes or fails to change, by analogy to some other (usu- 
ally more frequent) syntactic construction. In syntax, however, it is much harder 


The same phenomenon is also found in Spanish with aterrizar ‘to land on earth’, alunizar ‘to 
land on the moon’, etc. 
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to be certain that some change was due to analogical relations. A relatively re- 
cent (Colombian) Spanish innovation is [lo más de X¿q;] (the most of X, meaning 
‘quite X’ shown in (6): 


(6) [lo más bonito ] + [de lo más bonito ] — [lo más de bonito ] 
the more pretty of the more pretty the more of pretty 
‘the prettiest’ + "(one) of the prettiest’ — ‘quite pretty’ 


Here we see that the [lo más de X4;] construction is a sort of blend between 
two different constructions, but has a unique and different meaning from the 
original constructions. 

Comprehensive discussions of the role of analogy in language change and his- 
torical linguistics can be found in Anttila (2003), Hock (1991; 2003), Trask (1996) 
and, of special historical relevance, Paul (1880). 

Finally, it is important to mention that single case analogy is usually thought 
of as a cognitive process and not as a description of a system property. Single 
case analogy is about what speakers do when new forms are coined, single items 
regularize, or when some predictable phonological change fails to apply in some 
specific cases. This kind of analogy will not be discussed in this book. 


2.1.2 Proportional analogies 


A different kind of analogy is termed proportional analogy. In its simplest form, 
proportional analogy involves four elements, such that: A:B-C:X, A is to B as C 
is to X. The idea here is that we can find X by looking at the relation between A 
and B. The earliest mention of this kind of analogy is in Aristotle's Poetics: 


By 'analogical' I mean where the second term is related to the first as the 
fourth is to the third; for then the poet will use the fourth to mean the 
second and vice versa. And sometimes they add the term relative to the 
one replaced: I mean, for example, the cup is related to Dionysus as the 
shield is to Ares; so the poet will call the cup ‘Dionysus’ shield’ and the 
shield ‘Ares’ cup”; again old age is to life what evening is to day, and so he 
will call evening 'the old age of the day' or use Empedocles' phrase, and call 
old age ‘the evening of life’ or ‘the sunset of life’. (Russell & Winterbottom 
1989: Chapter IIT) 


This is a rather old concept, which has also been used in linguistics extensively, 
most notably in morphology but also in historical linguistics (Paul 1880). This 
kind of analogy is often present in word-based theories of inflection and deriva- 
tion, where fully inflected forms are related to each other by proportional analo- 
gies, instead of operations deriving inflected forms from stems (Blevins 2006; 


2.1 The many meanings of analogy 


2008; 2016). Blevins (2006: 543) gives an example from Russian, with the nouns 
škola ‘school’ and muscina ‘man’ in the nominative and accusative as in (7). 


(7) Analogical deduction 
a. &kola:skolu = muščina:X 


b. X2mu&tinu 


Example (7) illustrates that if we know that for the nominative form škola 
there is an accusative form školu, then we can infer that for the nominative form 
muscina there will be an accusative form muscinu. Word based and exemplar 
based theories of morphology usually assume that the whole inflectional (and 
sometimes derivational) system of a language works as a system of analogies be- 
tween known forms. This also implies that proportional analogy can (and should) 
be extended to sets. For example, it is not just the relation skola-skolu which de- 
termines the relation muséina- musCinu, it is rather the whole set of nominative- 
accusative pairs speakers know. 

The use of proportional analogies has not been limited to inflectional mor- 
phology. There are several proposals for derivational morphology. Singh & Ford 
(2003) propose a model in which derived words and simplex forms are related to 
each other by proportional analogies and not through morphemes or rules (see 
Singh et al. 2003 for several related papers, also Neuvel 2001). In this approach, 
formations like: Marx:Marxism-Lenin:Leninism, are not related by a morpheme 
-ism, but by direct analogies as shown in (8): 


IS) /XName/ —/Xizm/ 


However, it is not completely clear how this differs from theories like Booij's 
Construction Morphology (Booij 2010), where this exact kind of relation is ex- 
pressed by a construction in a very similar manner as in (9): 


(9) [Xname-ism] > [pertaining to SEM(X)] 


Booij (88) suggests the difference between analogy of this kind and construc- 
tions is a gradient one, but without a clear formalization it is hard to evaluate this 
claim. This is a common issue with the use of proportional analogies to model 
some (or all) of morphology. These proposals are rarely, if ever, properly formal- 
ized (a notable exception is Beniamine 2017), and it is not always clear how they 
differ from rules. From a purely non-cognitive perspective, it is not obvious what 
it means to say that there are no morphemes or rules, but only analogies between 
whole forms. The real difference seems to be in the assumptions about mental 
representation and the need for rich storage of fully inflected forms. 
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One possible clear distinctive feature of proportional analogy approaches is 
the existence of bidirectional relations, not usually assumed in other kinds of ap- 
proaches to morphology. Proportional analogies can usually go in any direction, 
from any cell in a paradigm to any other cell and from a member of a deriva- 
tional family to any other of its members. This property also means that there is 
no need for an arbitrary partition of words into stems and markers/morphemes, 
but the rules can look at whole words. 

The lack of computational implementations of these proposals means that we 
cannot really evaluate how well word-based models perform at a larger scale. 
Although very appealing for their simplicity, it is possible that models solely 
based on proportional analogies cannot capture certain parts of morphology. In 
the end, we require a precise system that produces the X in the analogical equa- 
tions, and this usually boils down to some sort of phonological rule set. This 
is not to say that there has been no work on computational implementations 
of proportional analogies. On the contrary, there is extensive literature on how 
proportional analogies can be modelled computationally (Federici et al. 1995; Fer- 
tig 2013; Goldsmith 2009; Lepage 1998; Pirrelli & Federici 1994b,a; Yvon 1997). 
An extensive discussion of this work is not possible, but two issues are worth 
mentioning. First, most work on computational implementations of analogy fo- 
cuses on languages like English, Italian or Spanish. This means that it is unclear 
how well these systems generalize to phenomena not found in Indo-European 
languages (e.g. phenomena like non-concatenative morphology, tonal processes 
found in African languages, etc.). Second, well formalized, computational imple- 
mentations of proportional analogies tend to only cover some part of a language 
or address some specific task. I am not aware of a computational model of propor- 
tional analogies which covers all of derivation and inflection of some language. 

A different kind of phenomenon also modeled with proportional analogies 
is paradigm leveling. Paradigm leveling is the process by which irregular or al- 
ternating forms in the paradigm of a verb become homogeneous. A simple re- 
cent example is the superlative of fuerte ‘strong’ in Spanish. The original form in 
19th century Spanish was fortisimo ‘very strong’, but it eventually turned into 
fuertísimo during the 20th century. The idea is that proportional analogies with 
bueno:buenísimo “good'* puerco:puerquísimo ‘dirty’, etc., would cause the change. 

A generalization of this kind of process can be seen in the development of 
paradigm uniformity in language change (see Albright (20082) for a review). Al- 
bright (2008a: 144) gives the example of the eu ~ ie alternations in New High 
German in Table 2.1? 


"The form bonísimo existed until around the 19th century. The assumption is that this form also 
regularized on the basis of other analogies at the time. 

? As marked by Albright (20082: 144), in the example > represents a regular sound change while 
= represents a form that has been replaced by an analogical process. 
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Table 2.1: Middle High German to Early New High German 


‘to fly? Middle High German Early New High German New High German 


1sG vliuge > fleuge = fliege 
2sG vliugest > fleugst = fliegst 
3sG vliuget > fleugt = fliegt 
TPL vliegen > fliegen > fliegen 
2PL vlieget > fliegt > fliegt 
3PL vliegen > fliegen > fliegen 


The singular and plural forms had different diphthongs in MHG and ENHG, 
but in the change to NHG the singular and plural stems became identical. The 
claim is that because of an analogical process with the rest of the paradigm, the 
eu forms for the singular cells of the paradigm were replaced by ie forms to make 
the paradigm more uniform. This goes beyond single case analogies, but it can 
still be seen as regularization product of proportional analogies in the sense that 
the leveling increases the scope of a proportional analogy, making it more useful 
for speakers. 

Proportional analogies are not really a process. Unlike the kinds of analogies 
discussed in the previous subsection, proportional analogies hold independently 
of speakers and cognitive processes. Proportional analogies hold, for example, for 
morphological paradigms of dead languages no longer spoken. But proportional 
analogies can motivate a leveling process in a paradigm, as with the examples in 
Table 2.1. 


2.1.3 Analogical classifiers 


A superficially similar, but distinct type of analogy is what I will call analogical 
classifiers. Analogical classifiers are assumed to be responsible for disambiguat- 
ing between two alternatives for some lexical item. Languages often exhibit in- 
stances where a given lexeme has to be assigned to a certain category or class, 
or must receive some feature, but this assignment does not directly follow from 
other morphosyntactic properties of said lexemes. In such cases, speakers are 
faced with a choice between two or more categories (or processes or features 
or classes, etc.) that could apply to this item and they must chose from several 
alternatives. Since speakers do make a choice, and usually there is agreement 
about what the right choice is, there must be a mechanism in place that disam- 
biguates between the alternatives. This mechanism is analogical if it is based on 
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similarity relations between the item that needs to be classified and other items 
for which class assignment is known. This is the type of analogy I will focus on 
in the remainder of this book. 

The previous sections showed that analogy is sometimes understood as a pro- 
cess speakers use, which is different in the case of analogical classifiers. Here, we 
do not deal with a process, but a system of relations. As we will see, analogical 
classifiers can be implemented with the help of various techniques, but this does 
not mean that the techniques we use to build analogical classifiers have a direct 
relation to what speakers do. There is so far no answer to this question, and I will 
not attempt to answer it here. 

Analogical classifiers are a relatively popular area of research among both 
formal and cognitive linguists. The role of phonological conditions on morpho- 
logical processes and allomorphs has been acknowledged for quite some time 
(Kuryłowicz 1945; Bybee & Slobin 1982; Carstairs 1990) as well as the role of se- 
mantic factors (Malkiel 1988) on similar processes. This is usually known in gen- 
erative grammar as allomorphy (Nevins 2011) and in usage-based and cognitive 
linguistics as analogy (Bybee & Slobin 1982). Despite some apparent terminolog- 
ical disagreements, and despite the fact both communities tend to ignore each 
other, phonologically conditioned allomorphy and analogy (in the sense of ana- 
logical classifiers) are not different kinds of phenomena. In both cases, we are 
dealing with alternations between multiple alternatives, which are resolved on 
the basis of phonological and semantic factors. 

Analogy as a classifier lies in strong opposition to proportional analogies, how- 
ever. As explained in the previous subsection, according to a model of propor- 
tional analogies, given some form C for which we want to find a corresponding 
X, we infer X by looking at items A similar to C for which we know B. This 
approach tries to avoid an abstraction step, namely the use of classes. 

Given the basic proportional analogy formula A:B-C:X, the association be- 
tween A and B is direct and thus the association between C and X must also 
be direct. But this does not need to be the case, the association between A and B 
can be mediated by an intermediate abstract feature. To make things more clear 
we look at some concrete examples. Tables 2.2-2.5 show the inflection classes -a, 
-ja,-o and -jo for Gothic nouns (Braune 1895).* 


‘In class -ja /ei/ can contract to /ji/ on long stems. 
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Table 2.2: Gothic -a declension class 


‘day’ ‘bread’ 
Singular Plural Singular Plural 

NOM dags -s  dagos äs hlaifs -s  hlaibos ` os 

acc dag -Ø dagans -ans hat -Ø  hlaibans -ans 

GEN dagis -is dagé -é hlaibis -is  hlaibe -é 

DAT daga -a  dagam -am  hlaiba -a  hlaibam -am 

Table 2.3: Gothic -ja declension class 
‘army’ ‘herdsman’ 
Singular Plural Singular Plural 

NOM harjis -jis harjos  -jos  haírdeis -eis  haírdjos ` Jos 
acc hari -i  harjans -jans  haírdi -i haírdjans  -jans 
GEN  harjs -jis harjē -jé haírdeis -eis  haírdje -jé 
DAT harja -ja baam -jam  haírdja -ja  hairdjam -jam 


Table 2.4: Gothic -o declension class 


gift 
Singular Plural 
NOM giba -a  gibos  -os 
ACC giba -a  gibos  -os 
GEN  gibos -os  gibo -6 
DAT  gibái -ái  gibóm Om 
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Table 2.5: Gothic -jo declension class 


‘band’ 
Singular Plural 
NOM bandi -i bandjos ` Jos 
ACC  bandja -ja  bandjos  -jos 
GEN  bandjós -jós  bandjo Bis 


DAT  bandjái -jai bandjom  -jóm 


If we only consider these four classes, we can find proportional analogies that 
help predict most cells. For example, knowing the dative plural form hairdjam 
‘herdsman’ is enough to know that its genitive plural form must be hairdjé. How- 
ever, some cells are not fully determined. Knowing that gibos ‘gift’ is a nomina- 
tive plural is not enough for us to determine that the nominative singular should 
be giba and not gibs, by analogy with dagos “day”.> ° 

From the perspective of analogical classifiers, the alternative is that the inflec- 
tion class completely determines all cells of the paradigm of any lexeme. The indi- 
vidual cells, in turn, carry information about the inflection class. The distinction 
might seem trivial, but it requires an important abstraction step. From the ana- 
logical classifier perspective, the form haírdjam uniquely determines that haírd 
belongs to class -ja and, similarly, the form gibos should uniquely determine that 
gib belongs to class -o. Examples (10) and (11) schematically represent how each 
approach works. 


(10) Proportional analogy 
a. harjam:harje-haírdjam:X 
b. X-haírdé 
(11) Analogical classifier: 
a. harjam € class—ja 
b. haírdjam € class—ja 


C. GEN.PL, class-ja, haírd-haírdje 


? Arguably, in a completely word-based approach there would also be confounding analogies 
with bandjos. 

*This situation where a cell in a paradigm only partially helps to predict another cell has been 
approached from an information theoretic perspective (Moscoso del Prado Martín et al. 2004; 
Ackerman & Malouf 2013; Blevins 2013; Ackerman & Malouf 2016; Bonami & Beniamine 2016). 
This approach measures the conditional entropy between cells in a paradigm, and thus quantify 
how informative different cells are about each other. In this book I pursue a different approach 
using accuracy measures. 
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While proportional analogies link forms to forms, analogical classifiers link 
forms to classes. Nevertheless, both analogical classifiers and proportional anal- 
ogy models share the core idea that new forms can be generated by making ref- 
erence to stored forms. 

For simple cases like the Gothic examples above, there is empirically no differ- 
ence between the approaches, and from a complexity perspective the analogical 
classifier requires extra components. On the other hand, analogical classifiers 
have certain advantages. The first one is that analogical classifiers are compati- 
ble with most, if not all, morphological theories. Meanwhile, models that make 
use of proportional analogy are usually their own theories of morphology. This 
means that accepting insights from analogical classifiers does not require giv- 
ing up on other theoretical concepts (e.g. stems, rules of impoverishment or con- 
structions). Additionally, from a historical perspective, analogical classifiers have 
been argued to be more accurate in describing linguistic change. According to 
Bybee & Beckner (2015: 506), constructions are responsible for licensing actual 
inflected forms, while analogies are responsible for licensing the combination 
of the aforementioned schemata with new lexical items: "given the productive 
schema [[VERB] + ed ] past, a new verb is added to the schematic category and 
that verb thereby becomes regular”, and it is an analogical classifier which as- 
signs a new verb to this schema. Bybee & Beckner (2015) argue that class as- 
signment ‘categorization’ is more important than pure proportional analogies in 
many cases of historical development. As an example the authors propose the 
verbs strike and dig, which ended up in the class of verbs like cling, swing, hang, 
etc. even though they do not actually match the schemas that describe this class 
(see next section for a discussion of this case). The argument is that proportional 
analogies did not actually take place, but speakers simply assigned these verbs 
to the V~u class: swing~swung (compare however De Smet & Fischer (2017) and 
Fertig (2013) for alternative views on the matter of analogical regularization). 

This sort of change is relatively common. Single regular items might be recat- 
egorized as belonging to some irregular class, or irregular items might become 
regularized. Whenever there is a change in markers it tends to happen across the 
board, applying to all items of a class. This behaviour of inflection classes seems 
more compatible with a categorization system where class assignment and mor- 
phological realization are independent from each other, than with a system were 
they are handled by the same process. 

All this being said, I will not focus on the distinction between analogical clas- 
sifiers and proportional analogy models, and although I exclusively focus on ana- 
logical classifiers, some of the results from the case studies might also apply to 
models of proportional analogy. 
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2.1.4 Summing up 


Ihave discussed three types of analogies that have been proposed in the linguistic 
literature: single case analogies, proportional analogies and analogical classifiers. 
Although being very different from each other, these three types of analogy all 
share the property of being processes or relations which: (i) focus on similarities 
between groups of items and (ii) allow for very fine-grained generalizations. As 
already mentioned, I will only discus analogical classifiers in this book. Integrat- 
ing single case analogy with theories of formal grammar will remain an open 
problem. 

Particularly within morphology and phonology, analogical classifiers (compu- 
tational and non-computational) have been proposed for a variety of languages: 
Dutch (Krott et al. 2001), English (Bybee & Slobin 1982; Arndt-Lappe 2011; 2014), 
German (Hahn & Nakisa 2000; Motsch 1977; Kópcke 1988; 1998b; Schlücker & 
Plag 2011) Catalan (Vallés 2004; Saldanya & Vallés 2005), French (Holmes & 
Segui 2004; Lyster 2006; Matthews 2005; 2010), Polish (Czaplicki 2013), Roma- 
nian (Dinu et al. 2012; Vrabie 1989; 2000) Russian (Kapatsinski 2010; Gouskova 
et al. 2015), Spanish (Afonso et al. 2014; Eddington 2002; 2004; 2009; Pountain 
2006; Rainer 1993; 2013; Smead 2000), Navajo (Eddington & Lachler 2006), Zulu 
(O'Bryan 1974), as well as more theoretically oriented work (Skousen 1989; Sk- 
ousen et al. 2002; Skousen 1992) among many others. It is not possible to discuss 
all, or even the majority, of these works here. In the following sections, I will 
address some of the most relevant studies. In addition, the case studies in Part II 
discuss some of the previous models that have tackled the phenomena in ques- 
tion. 


2.2 The mechanism for analogy 


So far I have not discussed what the mechanism for implementing the similarity 
relations in analogical classifiers actually is. As this is not the most crucial issue 
for the topic at hand, I will not be concerned with the question of the advantages 
and disadvantages of the different techniques. I will also not address the question 
of psycho-linguistic plausibility or mental representation. These are, no doubt, 
important empirical issues, but they are ultimately tangential to the aim of this 
book. In this section I will present a brief overview of different systems that have 
been previously proposed and argue for the method I have chosen for the case 
studies in Part II. 

In the literature there are four types of proposals for what the process behind 
analogy (understood as analogical classifiers) could be. These are listed in (12): 
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(12) 


simple, contextual rules; 


vB 


schemata; 


o 


multiple-rule systems; and 


d. computational statistical models 


Many of the studies that have used one or the other also argued for why the 
alternatives are inferior or not to be preferred (Albright & Hayes 2003; Yaden 
2003; Eddington 2000; Gouskova et al. 2015). I will argue instead that, leaving 
the point about cognitive representation aside, the systems in (12) are all more or 
less the same. The small differences we find between these four approaches are 
rather minor, and, in principle, one can almost always translate from one to the 
other. 


2.2.1 Simple rules 


Contextual rules are probably the oldest implementation of analogical classifiers, 
but they are also not associated with the word analogy very often. Contextual 
rules are commonly found in phonology (Chomsky & Halle 1968 and Goldsmith 
et al. 2011 among many others), but can be used for pretty much any domain. The 
format of contextual rules is usually P / c, where P stands for some process and 
c stands for a given context. Of course, not all uses of contextual rules count as 
analogical classifiers, but this does not prevent the implementation of an analog- 
ical classifier by using contextual rules. We can easily convert the format above 
into c / F, where c stands for a class and r for a feature, meaning that if an item 
has some feature r it then belongs to class c. 

Phenomena that can be described in this manner are usually very small (in 
number of classes) and the generalizations tend to be rather straightforward. One 
well known example in the literature is the nominative marker in Korean (Lee 
1989; Song 2006).’ Korean nouns take the nominative marker -i after consonants 
and -ka after vowels as seen in (13): 


(3) 


mom-i ‘body.NoM’ 


vB 


kanhowen-i ‘nurse.NOM’ 


o 


nay-ka LNom' 


k"o-ka ‘nose.NOM’ 


œ 


"The actual distribution of this particle is more complex than just a nominative marker. See 
Song (2006) for a thorough description of its morphosyntactic properties. 
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Based on grammatical descriptions, there do not seem to be any exceptions to 
this rule. One could model this behaviour in terms of rules as illustrated in (14). 


(14) a -i/...C# 
b. -ka/... V# 


But this is not a classifier. This is rather a morphological process that takes 
into account the phonological context under which it can apply. To model this 
phenomenon with an analogical classifier we simply propose two noun inflec- 
tion classes for Korean: class-i and class—ka. Nouns belonging to class-i take the 
marker -i in the nominative, while nouns that belong to class-ka take -ka in the 
nominative. Then, the rules in (15) assign nouns to either class: 


(15) a. class-i/ ... Ct 
b. class-ka / ... V# 


This might look like we have simply rewritten same statement a different way, 
but it shows that analogical classifiers can easily handle simple regular cases 
of phonologically determined allomorphy. It also shows that simple contextual 
rules can be used to implement analogical classifiers without difficulty. 

Although the Korean example is completely regular, this is rarely the case in 
allomorphy. The seemingly simple plural system in Spanish is a good example to 
illustrate this. Spanish nouns can end in vowels (gato “cat.masc' or consonants 
(baúl ‘trunk’, but not glides. The plural morpheme in Spanish has two main allo- 
morphs: -s and -es, which are almost always predictable from the final segment 
of the singular form of the noun, as can be seen in (16): 


(16) a. class-s/ ... V# 
b. class-es / ... C# 
(17 a. gatos 
b. baüles 


However, it is easy to find systematic exceptions to this simple rule. One 
kind of exception is found in relatively recent English loanwords: (e)sticker 
— (e)stickers “sticker”? snicker snickers, as well as with older French loanwords: 


? Alternatively one could define only: -i/. C£ as contextual, and -ka as default or the other way 
around. 

?Since this word is still in its early stages of borrowing there is no established orthography, but 
the pronunciation is /estiker/. 
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cabaret — cabarets ‘cabaret’, carnet — carnets ‘ID card’. Less systematic exceptions 
occur in words with atypical phonotactic patterns such as ají 'chili peper' or coli- 
bri ‘hummingbird’ which can take several different plural forms: ajis/ajies/ajises 
and colibrís/colibríes. These are atypical because Spanish words do not usually 
end in a stressed /i/, but they are systematic in the sense that other words with 
this same ending would also allow for at least two different allomorphs (e.g. 
manatí — manatís/ manatíes ‘manatee’. This set of additional contexts could also 
be captured by additional rules:!° 


(18) a. class-es / ... í& 
b. class-s / ... et 


c. class-s / ... ker 


Additional (exception) classes would also be needed for markers like -ses: ajises 
‘chili pepers’, doceses ‘twelves’. What this Spanish example shows is that even 
apparently simple cases might have some hidden complexity. In the end, however, 
contextual rules can be used to build a classifier that captures the system. 

Phonologically conditioned allomorphy is a well known problem and there 
are many examples in the literature (Alber 2009; Anderson 2008; Baptista & 
Silva Filho 2006; Booij 1998; Carstairs 1998; Malkiel 1988; Rubach & Booij 2001), 
a recent review is given by Nevins (2011). However, the generative literature 
has almost exclusively focused on cases where the phonological conditioning is 
straightforward and can be written as a set of rules or constraints, ignoring those 
cases where there are no simple rules that can account for the phenomenon. 

There are several reasons why phonologically conditioned allomorphy presents 
difficulties for traditional grammar theories. The main one is that this is a phe- 
nomenon which seems to be completely unmotivated and which adds unneces- 
sary complexity to the grammar. The second reason is that many cases do not 
seem to follow any sort of clear rule pattern (although as we will see, if one 
looks closely enough, this is not the case). The lack of clear patterns means that 
the rules in the grammar must make reference to arbitrary features or adhoc 
constraints. 


2.2.2 Schemata 


The previous subsection showed that Spanish plural formation, although rela- 
tively simple, is not uniquely determined by one single rule, but rather by several 


One clarification would have to be added regarding additional exceptions like caset plural 
casetes/casets, where the system seems to have added a more regular plural. 
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rules that make reference to the different endings of nouns. With this example in 
mind, one might ask how specific the phonological environment can be, and how 
many different possible environments there can be that determine a given alter- 
nation. There is no theory-internal or theoretically motivated answer to this ques- 
tion. In principle, the context of a rule could make reference to many segments, 
and one could have a system with dozens of different contexts. While the formal 
literature talks about rules, the usage-based literature talks about schemata. 

To illustrate this, we will look at the phenomenon probably most often dis- 
cussed in the literature: irregular verb formation in English. Regular verbs in En- 
glish build their simple past form adding a -t/d marker to the stem. Additionally, 
there are groups of irregulars which do not follow this pattern. Bybee & Slobin 
(1982) showed that forms in (19) are not arbitrarily irregular (see also Kópcke 
(1998a) for a comparable analysis of German strong verbs) but that there are 
schematic properties they all share and that nonce words can be assigned to this 
conjugation pattern if they are formally similar enough to other existing items. 
Bybee & Slobin (1982) call these similarity relations a schema. For (19) they pro- 
pose: /...ow#/~/...uw#/, and for (20): /...(N)K4/=/...u(N)K+4/ H 


(19) a. draw - drew 
b. blow - blew 
c. grow — grew 
d. know - knew 
e. throw - threw 
a. stick — stuck 
b. sink - sunk 


c. swing — swung 


(20) 


d. string - strung 


One could suggest more detailed schemata (e.g. make reference to the initial 
consonant cluster structure most verbs in (19) seem to share: /CL.../,, etc.) 

The difference between schemata and rules is not obvious. One factor that 
has been mentioned as distinguishing schemata from rules (and favouring the 
former) is that they interact with prototype theory (Kópcke 19982). While rules 
are blind to what lexicalitems they apply to, schemata can take into consideration 
the prototype of a class. In (20), the prototypes would be swing or string, and new 


"Where K stands for a velar and N stands for a nasal. 
Where L stands for a liquid. 
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items will be more or less likely to belong to this same class according to how 
similar they are to these prototypical items. In a prototype approach to analogy, 
the analogical relation to the prototype(s) of a class is more important than the 
relation to non-prototypical items. In such a system, schemata do not need to be 
completely strict, but specify preferences. They can match items that are not a 
perfect match, but only partially fit them. 

Schemata are usually more specific than rules, and list more phonological ma- 
terial, but this can be emulated equally well by rules. The supposed softness of 
schemata can also be modelled with either more specific, larger sets of rules, or 
with rule weights, as in the following section. 

Croft & Cruse (2004: chapter 11.2-11.3) argue that schemata can be output- 
oriented, i.e. they can specify the specific value of certain output, independently 
of what the input would be (see also Bybee 1995). In (20), the output schema 
would be [...aN]past. This schema then groups together all verbs that build their 
past form with /an/, independently of what their present form/stem is, and what 
processes would need to apply to them to form the past form. 

It is important to note that output-oriented schemata are a way of generaliz- 
ing over inflected forms. However, these kinds of schemata are not classifiers. 
From the schema [...45]54,; one cannot know whether a particular verb inflects 
according to this schema or not. There needs to be a different mechanism which 
links the present tense form with the past tense form, or the lexeme with this 
output schema. Therefore it remains unclear whether this kind of schemata are 
relevant for analogical classifiers. 

The difference between schemata and rules is a subtle one, and it usually 
has more to do with cognitive representation and performance. Both rules and 
schemata would need to be formalized before one could establish that they are 
not equivalent. Currently, there is no way of assessing whether the difference 
is spurious. In any case, it is always possible to translate a rule-based system to 
a schema-based system and the other way around. In the end, the use of one 
or the other seems to be more determined by the theoretical background of the 
researcher. Formal linguists usually prefer the use of rules, while cognitive and 
usage-based linguists prefer schemata. 


2.2.3 Multiple-rule systems 


The generalization of simple rule-based systems is the use of multiple-rule sys- 
tems. There is no unified theory of how multiple-rule systems (for the purpose of 
modelling allomorphy) should work. A system could include a specific order of 
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application, follow Panini's principle’, or be entirely ordering agnostic. One can 
write rules that only look at endings of words, complete word forms, semantics, 
etc. Rules can be categorical, assign weights, or be probabilistic. Since there is 
no agreement regarding what the properties of these systems should be, I will 
briefly discuss two cases from the literature. 


2.2.3.1 Estonian inflectional classes 


An impressive example of classes modelled with multiple rules, is the Estonian 
inflectional system. There are around 40 inflection classes for Estonian nouns 
depending on how one counts main classes and subclasses (Erelt et al. 1995; 1997; 
Mürk 1997; Blevins 2008), and there is no obvious systematic way of predicting 
the class of a noun. Blevins (2008: 242) gives the examples in Table 2.6 to illustrate 
the three main Estonian inflection classes (originally in Erelt et al. 2001).!^ These 
three classes in turn can be subdivided into further subclasses. 


Table 2.6: Main Estonian inflectional classes 


Class I 
SG PL SG PL 
NOM maja majad ` Ip lipud 
GEN maja majade lip ‘lippude 
PART maja majasid  lippu ‘lippusid 
ILLA2/PART2 `majja maju ‘lippu lippe 
‘house’ (3) ‘flag’ (20) 
Class II Class III 
SG PL SG PL 
NOM kirik kiriku inimene inimesed 
GEN kiriku kiriku inimese inimeste 
PART kirikut kiriku inimest - 
ILLA2/PART2 - - ini messe inimesi 
‘church’ (12) ‘person’ (12) 


PPanini's principle says that in cases where two rules compete with each other, the more specific 
rule will win the competition (Zwicky 1986). 

"The grave accents indicate overlong syllables. The numbers in brackets indicate the inflectional 
subclass given in (Erelt et al. 2001) 
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Table 2.7: Rule system according to Viks (1992) 


n. syllables final sounds medial sounds class coverage (n. nouns) 


1 c 0 22 2612 
cUS 0 11 2036 


as 
Lé 


From the examples in Table 2.6 we see that these classes show different mark- 
ers for most cells. Despite its apparent complexity, the inflectional class of a noun 
is highly predictable from its phonological shape (with some exceptions). Viks 
(1995) shows a model that can successfully predict the inflectional class of most 
Estonian nouns (see also Viks 1994). Viks' model consists of a series of handwrit- 
ten rules that make use of three features: number of syllables, final phonemes of 
the stem and medial phonemes. Of the final set of 117 rules, 28 alone offer some 
73% coverage, while the remaining 89 offer around 27% coverage on their own. 
The total set of rules covers 93% of nouns”. The main point here is not a detailed 
description of all of Viks' rules, the interesting aspect of this system is that a 
small set of rules covers a relatively large portion of nouns, while a larger set of 
rules is there to account for the rest of the system. 

As an example we can see the two rules for nouns in Table 2.7. In the descrip- 
tion of the segments, Viks uses the symbols c to indicate any of the consonants: 
BDFGHJKLMNPRSDZPTV and capital letters stand for literal letters. The class is a 
number as defined in A concise morphological dictionary of Estonian (Viks 1992). 

To decide between the many different rules, Viks' (1995) model uses a simple 
rule-ordering procedure, “as soon as the first matching rule is found it is imple- 
mented regardless of the following ones". The rules follow an extrinsic order, 
designed to maximize the accuracy of the system. Viks' (1995) model fulfills all 
characteristics of an analogical classifier: it makes use of phonological properties 
of lexemes to assign them an inflection class. 


2.2.3.2 English past tense formation (again) 


A different example of a multiple-rule-based system is discussed by Albright & 
Hayes (2003). In this study, the authors compare three possible models for the 
formation of the past tense in English verbs: (i) a simple rule-based model, (ii) a 


The coverage does not add up to 100% because there is some overlap. 
‘Notice the class numbers are arbitrary and independent of the rules and rule-ordering. 
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weighted, multiple-rule-based model, and (3) an analogical model based on work 
by Nosofsky (1990). 

The weighted rule-based model proposed by Albright & Hayes (2003) is based 
on the minimal generalization algorithm first proposed in Albright & Hayes 
(1999). The basic idea of this algorithm is as follows. For a given morphological 
process that applies to a set of items, the algorithm first tries to generalize across 
the set of items (in this case past tense formation) and then infer the minimal 
rules that captures all items. For example, if the algorithm only sees shine-shined 
and consign-consigned, it will make the generalization in Table 2.8. 


Table 2.8: Minimal generalization learner 


change variable shared feature shared segment change location 


a Q=od/ S am — lipat 
b. Ø—d/ kon s am — NS 
+strident 
c Qod/ X | *contin | am cu sss 
-voice 


The steps in Table 2.8 show how the minimal generalization algorithm works. 
In the first column, we see the phonological change that needs to be applied to 
the present tense form, in this case adding a /d/. As to the other columns, in (a) 
and (b) we see two individual instances of attested past tense forms with their 
corresponding present tense form. The step in (c) corresponds to the minimal 
generalization of (a) and (b). It assigns an X to the segments which are not com- 
mon between both forms, generalizes over /f/ and /s/ in terms of their feature 
representation and keeps the shared segments /am/. This is all within the gen- 
eral context of the operation of forming the past tense. 

After this process is iterated, the algorithm arrives at a series of rules, of dif- 
ferent degrees of generality, that cover the attested items. Using the accuracy of 
the rules and their coverage (how many items they apply to), the model then 
calculates weights for these rules. The weights allow the model to infer degrees 
of confidence for each rule and to the forms derived from them. This model can 
thus emulate, to a certain extent, the schemata proposed by Bybee & Slobin (1982), 
in that the clusters of similarity like fling-flung, sting-stung, cling-clung can be 
captured by small rules that specifically apply to them. For these three items, 
the minimal generalization learner produces the rule: /1/ — /a/ / [[-voice] 1 - 
_nJí[+past];. For the larger, more general set that adds win, swing, dig, spring, 
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spin, sting, wring, string, the model has the more general rule: 1 — a / [XC - 
[+voice, -continuant]]{[+past]}. And so on for the other cases. With these sets of 
rules, Albright and Hayes's model predicts that there should be "islands of relia- 
bility" in the irregular past tense, where verbs that look alike, by conforming to 
the context of the rules, will behave according to said rules. 

To evaluate their model against the purely analogical model, Albright & Hayes 
(2003) performed two wug experiments where they asked speakers to produce 
the past tense of nonce verbs. These words were selected to either belong, or did 
not belong to the islands of reliability predicted by their model. The authors com- 
pared the responses given by the speakers with the probabilities predicted by the 
three different models. In the end, the multiple-rule-based model outperformed 
other computational models, including a multiple-rule-based model that did not 
include weights. 

Since Albright & Hayes' (2003) original model works from inflected forms to 
inflected forms, it is not, in the strict sense, an analogical classifier. However, the 
minimal generalization learner as a method for inferring rules could easily be 
deployed in an analogical classifier. An important aspect of Albright & Hayes 
(2003)'s system is that the rules it produces are weighted rules, unlike the rules 
in Viks' (1994) system. This also means that there is no rule-ordering but weight 
comparison. If two different rules make different predictions for the same input 
lexeme, the prediction with the highest weight wins. Rule weights correspond, 
to a certain extent, to the idea of prototypes in the schema-based model. Rules 
wight stronger weights capture the more prototypical shapes in the system. 


2.2.4 Neural networks and analogical modelling 


Two of the main computational implementations of analogy, and the ones I will 
focus on in this section, are neural networks and Analogical Modelling (AM). 
The use of neural networks in linguistics has a relatively long history (Bechtel & 
Abrahamsen 2002; Churchland 1989; McClelland & Rumelhart 1986; Rumelhart 
& McClelland 1986a,b). The early models were labelled connectionist models and 
were aimed at explaining much more than just the choice between alternatives. 
In the second part of this book I will give a more detailed explanation of how neu- 
ral networks work, but the basic idea of neural networks is that they represent 
(linguistic) systems in the form of weights between input, hidden and output 
nodes. In the context of connectionist models, input nodes see the surface lin- 


Other exemplar-based models have received considerably less attention, see Matthews (2005) 
for an overview. 
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guistic forms, hidden nodes are used by the networks to represent the system in 
a non-symbolic way and output nodes produce the surface outputs.!$ 

Roughly speaking, there are two kinds of neural network implementations. 
Early connectionist models tried to directly link meaning to form, without any 
kind of category assignment. That is, in a neural network predicting past tense 
formation in English, the network would directly learn the past tense forms of 
verbs and directly produce inflected verbs. The alternative approach is to train 
the model to learn categories. Instead of directly learning that the past tense of 
fly is flew, the model would learn that fly belongs to the class of verbs that form 
the past tense with a vowel change to /ew/ (i.e. an analogical classifier). 

The framework of AM was initially developed by Skousen (Skousen 1989; Sk- 
ousen et al. 2002; Skousen 1992) and has been applied to a variety of differ- 
ent phenomena like gender assignment (Eddington 2002; 2004), compounding 
(Arndt-Lappe 2011), suffix competition (Arndt-Lappe 2014) and past tense forma- 
tion (Derwing & Skousen 1994), among others. Derwing & Skousen (1994: 193) 
summarize the logic behind AM as follows: 


to predict behavior for a particular context, we first search for actual exam- 
ples of that context in an available data base [...] and then move outward 
in the contextual space, looking for nearby examples. In working outward 
away from the given context, we systematically eliminate variables, thus 
creating more general contexts called supracontexts. The examples in a 
supracontext will be accepted as possible analogs only if the examples in 
that supracontext are homogeneous in behaviour. If more than one out- 
come is indicated by this search, a random selection is made from among 
the alternatives provided (Derwing & Skousen 1994: 193) 


The idea is that the classification of an item is made based on how other sim- 
ilar items are classified. The mathematical implementation is not too important 
here, what is important is that AM has essentially the same properties as a neu- 
ral network.” To be clear, computationally AM and neural networks are very 
different from each other. The point is that they are conceptually very similar. 
This point has already been argued by Matthews (2005: 289), who explains that 
there is no crucial difference between AM and connectionist models, as long as 
the connectionist model is trained as a classifier: 


In principle, neural networks simply relate inputs to outputs, with an arbitrary number of 
intermediate hidden layers. Inputs and outputs can be anything, not just surface linguistic 
forms. 

This should not be taken to mean that both produce exactly the same result, but that the results 
they produce are very similar. 
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a [neural] network designed to produce the same category mapping would 
have exactly the same property [as AM]. Indeed, when a network is con- 
structed to produce just classificatory outputs, its behaviour is almost iden- 
tical to that produced by AM (Matthews 2005: 289) 


It also follows that other approaches to analogical classifiers do practically the 
same job. Schemata are a way of measuring and finding groups of items that 
are surface similar, the same as the weighted rule approach. Even simple context 
rules like those found in phonology delimit groups of similar items. 


2.2.5 Analogy or rules 


The discussion of analogy/similarity systems vs rule-based systems is not new. 
Nosofsky et al. (1989) observed that rules can be used to compute similarity, 
which in turn would produce analogical systems. The distinction between both 
kinds of processes is not a simple one. The most explicit treatment of the differ- 
ences between analogy and rules is given by Hahn & Chater (1998). The authors 
first acknowledge that with the common conception of rules vs analogy (the au- 
thors use the term ‘similarity’ “the best empirical research can do is to test par- 
ticular models of each kind, not 'rules' or 'similarity' generally" (199), but then 
attempt to provide a clear way of distinguishing between rules an analogy. 

They identify two distinctions: (i) absolute vs partial matches, and (ii) relative 
degree of abstractness of the stored pass elements. Regarding (i) the authors say 
that: 


the antecedent of the rule must be strictly matched, whereas in the similar- 
ity comparison matching may be partial. In strict matching, the condition 
of the rule is either satisfied or not - no intermediate value is allowed. Par- 
tial matching, in contrast, is a matter of degree - correspondence between 
representations of novel and stored items can be greater or less (Hahn & 
Chater 1998: 202) 


and regarding (ii) that: 


Second, the rule matches a representation of an instance [...] with a more 
abstract representation of the antecedent of the rule [...], whereas the sim- 
ilarity paradigm matches equally specific representations of new and past 
items. The antecedent 'abstracts away' from the details of the particular 
instance, focusing on a few key properties (Hahn & Chater 1998: 202) 
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These arguments for distinguishing rules from analogy are unconvincing, how- 
ever. The argument in (i) only really matters if we can determine, with some inde- 
pendent method, the size of the units that the rules or similarity relations should 
have. Otherwise, any partial matching process can be emulated with ranked con- 
straints, decision trees, or weighted or ordered rules, as long as these rules are 
smaller than the larger partial match. So, for example, partial string matching 
of two strings can be decomposed into categorical matching of their correspond- 
ing substrings: given the strings “aabc” and “aabb”, a categorical rule will find a 
partial match, as long as the rule compares 3 letter substrings and returns true 
whenever at least one of the possible substrings is correctly matched. So, un- 
less there is some external reason for stating that the size of the comparison 
should be four letter substrings, the distinction between categorial rule-based 
and similarity-based comparison is a blurred one. 

An additional difficulty with (i) is that it makes rule-based systems a special 
case of similarity-based systems. This is because perfect matching will happen 
in similarity-based systems, which means that any similarity-based system can 
easily emulate a rule-based system. 

Finally, partial matching has the problem that it is not easily computationally 
implementable. Systems which implement partial matching usually do some sort 
of statistical evaluation as in the model by Albright & Hayes (2003), or decom- 
pose matches into smaller pieces. For example, the schema [Kkl....NK] can be sim- 
ulated by doing smaller exact matches of its individual elements. A computer can 
be programmed to do matching based on estimated probabilities or confidence 
values, but in the end there is either a strong threshold, or some randomization 
process, neither of which really constitute partial matching. 

The difficulty with (ii) is that, for the purpose of distinguishing between rules 
and similarity, it is a statement that is important from a psycholinguistic perspec- 
tive, but not from a modelling perspective, as the authors admit (203-204): 


Rule-based reasoning implies rule-following: that a representation of a rule 
causally affects the behavior of the system and is not merely an apt sum- 
mary description. Thus, only claims about rule-following are claims about 
cognitive architecture (Hahn & Chater 1998: 203-204) 


Their point is that the distinction about abstractness is important if we are 
concerned about cognitive architecture, because from a purely descriptive per- 
spective the distinction between rules and similarity breaks down. Thus, (ii) is 
more a statement about how speakers store and represent previously encoun- 
tered items and the nature of those representations. Although the question of 
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rich memory is an interesting and important one (see for example Bybee 2010; 
Kapatsinski 2014; Port 2010, among many others), it is completely tangential to 
the issue at hand. 

Albright & Hayes' (2003) attempt at distinguishing rules from analogy is even 
vaguer. The authors claim that the key difference between analogy and a rule 
is that rules represent structured similarity, while analogy represents variegated 
similarity. Structured similarity occurs when the similarity function is restricted 
by some structural property of the items it operates on, while variegated similar- 
ity occurs when it is not. If, for example, the similarity function can only look at 
the final syllable of a word, it is making use structured similarity. The toy exam- 
ple in (21) illustrates the difference between variegated and structured similarity. 
The rule in (a) makes use of structured similarity while the rule in (b) makes use 
of variegated similarity. While both rules match the same segments, the rule in 
(a) makes use of phonological structure because it restricts the position of the 
similarity to the final syllable of the word. The rule in (b), on the other hand, 
matches any lexemes that contain the sequence /at/ in any position. 


(21) a. class-X / .at# 
b. class-X / at 


This distinction is not very convincing, because it simple makes reference to 
a way of capturing similarity, which is mostly tangential to all other proper- 
ties of analogical models. As Albright & Hayes (2003: 5) then point out, most 
connectionist models can infer structured similarity, which is why they do not 
consider these models as pure analogy. Albright & Hayes (2003) show that struc- 
tured similarity seems to be a fundamental property of the linguistic systems 
they investigate, which they take to be support for rule rule-based models over 
analogical models. However, although it is true that some models ignore struc- 
ture altogether, lumping connectionist models together with rule-based models 
based on whether phonological structure is at play or not draws an unnecessary 
ad-hoc line between analogy and rules. From this perspective, none of the mod- 
els I use for the case studies are purely analogical, since they heavily make use of 
structural constraints on the similarity function, but they certainly are nothing 
like typical rule-based models. 

Finally, authors like Pothos (2005), working on analogy from a more general 
perspective and not specifically on linguistic systems, have also arrived at the 
conclusion that similarity (analogical) models and rule models are simply two 
extremes of the same gradient. For that reason, I will not attempt to draw clear 
distinctions between analogical and rule-based systems. I will employ neural net- 
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works for the case studies, but these models would work equally well with hand 
written rules or AM. 


2.2.6 Mental representations vs grammatical relations 


Analogical models of grammar, and more generally, analogical accounts of gram- 
matical phenomena are very often mixed in with discussions of mental storage, 
processing and psycholinguistic models (see for example Bybee (2010) and ref- 
erences therein). Eddington (2009: 419-420), for example, claims that “[i]n con- 
trast to rule systems, analogy assumes massive storage of previously experienced 
linguistic material" and that "linguistic cognition entails enormous amounts of 
storage and little processing". This is not restricted to usage-based linguistics, 
for example Gouskova et al.'s (2015) model explicitly mentions of storage and 
processing by speakers (see Chapter 6 and the next section). The questions of 
language processing and mental representation of language are important, but 
we can study analogical relations in the lexicon independently of them. 

Distinguishing between mental representations and grammatical descriptions 
is already commonplace in most formal approaches to grammar. Stump (2016: 
63-64), for example, makes a distinction between the mental lexicon (the set of 
forms speakers actually store) and the stipulated lexicon (“the body of lexical in- 
formation that is presupposed by the definition of a language's grammar" (64)). 
Rich mental storage does not go against the idea of a stipulated lexicon, but men- 
tal storage of derived or inflected forms is a tangential question to the items that 
need to be in the stipulated lexicon. Whether speakers only stored inflected and 
derived high frequency forms (Pinker & Ullman 2002; Ullman 2001; 2004)% or 
(possibly) every single form they ever encounter (Baayen 2007; De Vaan et al. 
2007), has no real impact on the number and nature of the items in the stipulated 
lexicon. 

Nevertheless, the linguistic discourse on analogy has not been free from the 
confusion between mental representations and structural properties. The defini- 
tions usually given for analogical models make explicit reference to the mental 
lexicon, storage and actual speaker performance: 


?"This position is relatively common among formal linguists who accept that frequency plays 
a role in processing (see for example Stump 2016 or Müller & Wechsler 2014), but it presents 
a problem with no solution as of yet: in these models, the only way of knowing whether a 
form has high frequency or low frequency, is to know its frequency. And the only way to 
know the frequency of a form is if said form has already been stored (Bybee 2010, but compare 
Baayen & Hendrix 2011). The issue could be circumvented with more complex mental storage 
architectures which can model frequency learning without direct frequency representations 
(Baayen 2011; Baayen et al. 2011; Baayen 2010; Baayen & Hendrix 2011). 
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The analogical approach, on the other hand, deals with complex and sim- 
plex lexemes and the way they are connected to each other in the mental 
lexicon. It is argued that the formation of new complex lexemes is based 
on the paradigms of similar existing complex lexemes and their formal 
properties rather than on abstract rules. (Schlücker & Plag 2011: 1540) 


Or: 


An important source of creativity and productivity in language that allows 
the expression of novel concepts and the description of novel situations is 
the ability to expand the schematic slots in constructions to fill them with 
novel lexical items, phrases or other constructions. Considerable evidence 
indicates that this process refers to specific sets of items that have been pre- 
viously experienced and stored in memory. A number of researchers have 
used the term 'analogy' to refer to the use of a novel item in an existing 
pattern, based on specific stored exemplars (Bybee 2010: 57) 


Analogical relations do not require us to postulate mental storage or psycho- 
logical processes and can be formulated independently of how speakers process 
language. The main point linguists working on analogy want to make is that 
analogy expresses a relation between word forms (Becker 1990: 11). 

While it is likely that speakers make use of some form of rich memory, and 
that analogy is closely linked to it, the model developed in this book does not 
require this assumption, but is compatible with it. The model I will develop in the 
following chapter is agnostic about these issues. The advantage of this approach 
is that we can avoid unnecessary debates and, most importantly, remove possible 
confounds. 


2.2.7 Summing up 


From a systemic perspective, there is not a real categorical distinction between 
schemata, computational systems and rules for modelling analogical classifiers. 
In the end, all these systems are used to find abstractions about the shape or 
meaning of lexemes and find similar clusters of lexemes which belong to the 
same class. The real difference appears when the question of learning and mental 
representation is put forward. Similarity-based systems, be they computational 
or schema-based, assume a rich memory model of language, where speakers store 
most of the items they encounter and actively use stored forms to process new 
ones. Rule-based models, on the other hand, make the assumption that rules are 
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learned from very few items and stored independently and abstractly. The latter 
type of models do not usually assume rich memory. 

If the main research goal is to address the question of how speakers process 
and represent linguistic structures, then the distinction between rule-based and 
similarity/analogy-based systems is important. However, as far as modelling is 
concern, we are simply talking about a matter of degree. Schema and rule-based 
models explicitly write what the similarity relations between items must be, 
while AM and other computational models use mathematical objects to infer and 
store these similarity relations. Because in this book I do not explore the ques- 
tion of mental representation or acquisition, I will not argue for or against any 
particular implementation. The main claims of this book hold true for any of the 
approaches described above.?! 


2.3 Missing pieces 


Despite the great progress that has been made in terms of computational imple- 
mentations of analogical classifiers (from now on also simply analogy), as well 
as in the coverage of different phenomena, there are still a few conceptual is- 
sues that have been ignored and which require an answer. Broadly speaking, 
most work on analogy has been carried out within the cognitive linguistics and 
usage-based linguistics communities (the most recent exception being Gouskova 
et al. (2015), who seems to mostly ignore work coming from these two commu- 
nities). For better or worse, research on analogical classifiers has mainly focused 
on developing new and better computational models, as well as trying to find 
out what the limits of analogical classifiers are, by applying them to all kinds of 
phenomena. This, however, has come with a relative lack of attention to proper 
formalization of what analogical classifiers actually are and how they relate to 
grammar. 

Some of the glaring problems were mentioned by Wills & Pothos (2012). The 
authors argue that analogical classifiers (what they call “categorization models") 
suffer from not being explicit about their scope, and because they are fitted to 
each individual phenomenon, models are not consistent with the variables they 
work on. This is an important point. In the examples mentioned in the previous 
section, the analogical classifiers were built to deal with only one alternation. 
Each classifier looks at specific predictors relevant for each phenomenon, and it 


"One possibly incompatible approach is Optimality Theory (OT). The problems that an OT 
model would face will become clear in the next chapter when I present the formal model of 
analogy. 
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is strictly confined to that phenomenon. So far, there is no theory of how this re- 
striction takes place or how it relates to the rest of the system. The main question 
missing and answer is given in: 


(22 How do analogical processes (understood as analogical classifiers) interact 
with grammar and with each other? 


This is not trivial. There is, so far, no analogical model that can capture most 
of, let alone all, language domains. There are not even analogical systems that 
can capture most of a single domain. In other words, analogical classifiers are 
designed to capture specific phenomena within a well defined and limited do- 
main, but they cannot capture the whole (or a sizable portion) of the morphology, 
phonology or syntax of a language. This basically means that even if we accept 
that a large number of phenomena in language require and are best accounted 
by, analogical systems, grammar (in the form of constructions, features, rules, 
etc.), still needs to take care of the rest. This also means that grammar needs to 
interact with analogy in a clearly defined way. 

Unless the claim is put forward that al] grammatical phenomena in language 
can be accounted for with analogy, a formally well defined interface between 
analogy and grammar is required. This interface must make explicit what kinds 
of interactions we see between analogy and grammar, what the domains are and, 
importantly, where the limits of analogy lie. 

The interactions between analogical classifiers are also poorly understood. 
Supposing that a language can have more than one phenomenon which is ex- 
plained by an analogical classifier, it is not clear whether these two classifiers 
interact with each other and how. If a language organizes irregular verbs and 
nouns according to analogical classifiers, one would like to also know whether 
these classifiers are independent from each other and to what degree. 

Another pressing issue relates to the targets of analogy, or the features analogy 
can and cannot see. Albright (2009: 185) correctly points out that "an adequate 
model of analogy must [...] be restrictive enough to explain why speakers gener- 
alize certain statistical properties of the data and not others". This question has 
mostly been ignored. Bybee (2010: 54) emphasizes that "[m]ost analogical forma- 
tions in language are based on semantic or phonological similarity with existing 
forms", but acknowledges that 


The problem faced in the full elaboration of such models, however, is in 
specifying the relevant features upon which similarity is measured. This is 
a pressing empirical problem. We need to ask, why are the final consonants 
of the strung verbs more important than the initial ones? (Bybee 2010: 62) 
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There have been a couple studies which have, only indirectly, dealt with some 
of these questions. In the generative literature most of the phenomena of phono- 
logically conditioned allomorphy are dealt with either context rules, or OT, or 
sometimes simply just described but not really modelled (e.g. Rubach 2007: 119). 
In the usage-based literature the issue of analogy-grammar interaction is mostly 
ignored, or taken from granted. As far asI am aware, there have been no attempts 
at explicitly answering the question in (22), only a few informal approaches. 

Probably the most explicit formulation of how analogy interacts with grammar 
is given by Bybee & Beckner (2015). Bybee & Beckner suggest a model where 
analogy classifies lexical items according to whether they are compatible with 
different constructions or not. For Bybee & Beckner (2015), a construction like: 
[Xvers-d] > [past(SEM(X))],?? is responsible for producing the past tense form 
of regular English verbs. What the analogical classifier does is simply decide 
which verbs can be combined with this construction.? However, Bybee & Beck- 
ner (2015) are not really explicit on how this happens or where. There are multi- 
ple alternatives: The analogical classifier could apply immediately whenever any 
new verb is learned and assign a feature to it specifying which inflectional con- 
struction it is compatible with, or it could apply every time a speaker wants to 
inflect said verb. It is also not clear how different constructions compete with 
each other. One could have a classifier which directly decides which construc- 
tions a lexical item is compatible with, or there could be individual classifiers for 
each constructions deciding whether some given lexical item is compatible or 
not. 

All this being said, this book is mostly an attempt at formally implementing 
and testing the Bybee & Beckner proposal, where analogy and grammar are in- 
dependent but closely interlinked with each other. 


2.4 Final considerations 


In this chapter I provided a short summary of some of the main different uses of 
analogy in linguistics. I presented single case analogies, proportional analogies 
and analogical classifiers. The main difference between single case analogy and 
analogical classifiers is that the former directly links forms to forms, while the 
latter uses an intermediate abstraction step that links forms to classes. 


2 Bybee & Beckner (2015) use a slightly different notation, but the idea is the same. 

2 A very similar model of analogy-grammar interaction is discussed by Gouskova et al. (2015). 
Working in the framework of Distributed Morphology (Halle & Marantz 1993), Gouskova et al. 
(2015) propose a model for dealing with Russian diminutives based on more or less the same 
principles. See Chapter 6 for a discussion of this model. 
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Analogical classifiers are of interest to both usage-based and formal linguists. 
Analogical classifiers are capable of capturing what has been seen as different 
processes (phonologically conditioned allomorphy, inflection classes, gender as- 
signment, etc.) and treat them as a single phenomenon. There are several tech- 
niques used for implementing analogical classifiers (rules, schemata and compu- 
tational implementations), and although superficially very different from each 
other, they are, at their core, very similar and often interchangeable. 

Although there has been a considerable amount of research on analogical clas- 
sifiers, there are still several questions pertaining to the interaction between ana- 
logical models and grammar. Answering these questions is crucial if we want to 
have a better understanding of exactly how much analogy can do and how much 
it cannot do. We want to avoid waiving away phenomena by simply invoking 
analogy as a magical solution, but we also want to avoid overly complicating 
grammatical analysis by trying to explain those aspects that analogical models 
can more easily capture. 
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In this chapter I will present a model that is able to address the questions raised 
in the previous chapter. This model captures the interactions between analogy 
and grammar, while at the same time being independent of the techniques for 
implementing analogy and agnostic regarding the theory of morphology. This 
allows us to have a system that is flexible enough to be compatible with most 
computational implementations of analogy and with a variety of morphologi- 
cal theories, as well as with usage-based insights, while remaining precise and 
constrained enough to make clear predictions about different properties of ana- 
logical systems. 

The first section of this chapter introduces feature structures and inheritance 
hierarchies and then develops the formal model to relate these structures to ana- 
logical classifiers. The following sections describe the informal set-up of the sys- 
tem and present a possible formalization. 


3.1 Basic assumptions 


3.1.1 Feature structures 


For the representation of lexical items I assume a very simple system of feature 
structures. Feature structures are common in many varieties of Construction 
Grammars (Bergen & Chang 2005; Croft 2001; Goldberg 1995; 2006; Sag et al. 
2012; Steels 2011), as well as in HPsG (Pollard & Sag 1994; Ginzburg & Sag 2000), 
LFG (Bresnan et al. 2016; Kaplan 1982), among others. Although theories differ 
in their assumptions about feature geometries, the differences mostly represent 
only theory-internal issues. In this book, I use the representation given in (1). 


() | type 
PHONOLOGY  phon-object 
CATEGORY cat-object 
SEMANTICS  sem-object 


3 Modelling analogy in grammar 


Example (1) shows three features in small caps and a type in italics. Features 
can take values, including other feature structures. The two main features I will 
be concerned with in this book are PHONOLOGY and, to a lesser extent, SEMAN- 
TICS. The feature PHONOLOGY contains the phonological representation of the 
lexical item, while the feature sEMANTICS contains the meaning or semantic rep- 
resentation of the lexical item. The feature CATEGORY contains morpho-syntactic 
properties of the lexical item (e.g. its part of speech and morphosyntactic prop- 
erties, among others). The type specifies to what types the lexical item belongs. 
A simple example of such a representation of the inflected form drew is given in 


(2): 


(2) | transitive-verb 


PHONOLOGY /dru:/ 


PART-OF-SPEECH verb 


CATEGORY VERB-FORM finite 
TENSE past 
SEMANTICS PAST(draw) 


This representation says that the word drew is of the type transitive-verb, in 
a finite verb form, in the past tense.! We see that the feature CATEGORY in turn 
takes another feature structure as a value. This is an extremely simplified repre- 
sentation — other feature like the valency, pragmatic features, etc., would also 
have to specified - but this representation is sufficient for the topics covered in 
this book. To reiterate, the three key aspects I will be concerned with are: the 
type of lexical items, their phonology and their semantics. 


3.1.2 Type hierarchies 


As mentioned above, analogical models work on the type of lexemes. In theories 
like Construction Grammar and HPsG, types are organized in hierarchies which 
help to capture common properties between different items. Hierarchies "provide 
tools for optimal encoding of lexical knowledge" because "properties of individal 
lexical items can be factored out into various general classes, each defined by the 
common attributes of its members" (Koenig 1999: 13). In this book, I adopt a very 
general version of type hierarchies. I do not assume any specific theory or any 
particular version of what the lexicon looks like. 


‘Tam using an informal semantic notation for the sake of simplicity. Any formal representation 
would also be compatible with the ideas of this book. 
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In a type hierarchy, types specify all common properties of their members, 
and their members inherit these properties. In other words, all members of a 
type have to satisfy the constraints imposed by that type. Types can have sub- 
types and super-types and can inherit from multiple super-types at the same 
time. This creates a complex network of relations for any given leaf type. By as- 
sumption, theories like HpsG take inheritance to be monotonic, and features of 
super-types cannot be overwritten by sub-types (Corbett & Fraser 1993; Brown 
& Hippisley 2012). Some versions of Construction Grammar, however, do oper- 
ate with non-monotonic inheritance (Booij 2010). For Booij (2010), lexical items 
can overwrite certain features imposed by their type. This approach helps make 
the hierarchical organization somewhat simpler. In this book I will assume that 
lexical items cannot overwrite features imposed by their types, but in the end 
either approach would work. The example in Figure 3.1 shows schematically the 
idea behind multiple inheritance. 


word 


C es 


pos valency 


P Cen 


noun adjective verb transitive intransitive 


Le 


kill run 


Figure 3.1: Example of multiple inheritance 


In Figure 3.1, run and kill share a set of properties by virtue of both being verbs 
(say, the feature [pos verb], which says that they are verbs), but kill inherits its 
valency from transitive while run inherits its valency from intransitive. Schemat- 
ically we have: 


(3) | verb 
CAT [pos «| 


(4) | transitive 


VALENCY (Subj, Obj) 


(5) | intransitive 


VALENCY (Subj) 
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(6) | transitive-verb 


PHON /kd/] 

CAT [pos verb] 
VALENCY (Subj, Obj) 
SEM kill 


(7) | intransitive-verb 


PHON /ran/] 
CAT [pos verb] 
VALENCY (Subj) 
SEM run 


A couple of observations are necessary regarding multiple inheritance. As in 
this example, it is usually the case that multiple inheritance systems assume fea- 
ture compatibility. The features inherited must be compatible. If we have the type 
for noun as in (8), there could not be a type that inherits from both noun and verb 
at the same time, because the values of pos clash. 


(8) |noun 


CAT [pos noun| 


In Chapter 6, I will discuss cases in which regular multiple inheritance does 
not work in this way. An alternative is to use empty types. Empty types are types 
which impose no constraints on their members, and from which nothing is in- 
herited?. The idea behind empty types is that groups of lexical items share the 
common property of undergoing some morphological process or taking some 
particular marker, but we want to formally separate the groups themselves from 
the actual morphological process. Using empty types can help us capture sev- 
eral inflection class phenomena, including cases of multiple inheritance. We can 
expand the hierarchy in Figure 3.1 to include inflection class.? 

In Figure 3.2, class-a — æ and class-d/t do not need to specify any feature. 
They are there to help the right inflectional constructions or rules apply to the 


“Notice empty types are only empty with regards to the morphological process, but they can, 
and in fact do specify phonological and semantic constraints on their members as described in 
the next sections. 

3In Figure 3.2, the type infl-class is a sub-type word, which is done so only for convenience. A 
more detailed type hierarchy would probably specify inflection class elsewhere. 
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word 


pos valency inf-class 


"o E us 


noun adjective verb trans intrans  class-d/t — class-aA—a&e  class-1— a 


kill run 


Figure 3.2: Example hierarchy for English verb inflection 


right items. In this case, the construction for regular verbs will add a /d/ or /t/ to 
the stem of the verb, while the construction for class-A — a will change the /4/ 
to a /æ/. 

This approach is roughly equivalent to saying that all lexemes specify their 
inflection class, but it has the additional property that we can easily organize in- 
flection classes in a way that allows us to capture properties that they potentially 
share. Of course, there are alternatives to this approach, in which the markers 
of the inflection classes are directly specified in the latter, but such an approach 
will add extra complexity that is neither necessary nor helpful for the arguments 
brought forth in this book. 

As will be shown in the next sections, the model minimally requires that there 
be a subtree of the hierarchy which organizes lexemes according to their inflec- 
tion class. The only important assumption here is that typing is responsible for at 
least some morphomic properties of a system, like inflection classes and shared 
properties between inflection classes can be captured by the use of mid-level 
types. 

What would not work for the analogical classifiers is to have a model where 
inflection classes are given directly by features on the lexical entries. Example 
(9) shows such an entry for the verb kill. 


(9) | tr-verb 
PHON /kil/] 
CAT [pos verb] 


INFL-CLASS  class-dt 
SEM kill 


The feature structure in (9) says that the lexical entry for the verb kill has an 
inflection class feature which specifies that it belongs to class-dt. In the following 
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sections it will become clear that the reason this kind of approach would not work 
is that even if the values of the feature INFL-CLASS were organized in a hierarchy, 
said hierarchy would not be able to impose constraints on the PHON and sEM 
features of lexemes. 


3.2 Analogy as type constraints 


Having introduced feature structures and type hierarchies in the previous sec- 
tion, we can now address the question of how analogy interacts with grammar 
formulated in the previous chapter. The solution I will pursue in this book is to 
link analogical classifiers to types in the hierarchy. The claim is spelled out in 
(10): 


(10) Analogical constraints are limited to types and can only run along the inher- 
itance lines of the hierarchy. 


I will call this hypothesis Analogy as a type constraint (atc). As far as I am 
aware, this is not an explicit assumption of any analogical classifier that has 
been proposed in the literature, but implicitly most models seem to make use of 
something similar. Analogical constraints in AM, for example, are limited to the 
lexemes that take part in some inflectional or derivational phenomenon, and the 
assumption is that the model does not generalize or analogize across phenomena 
(e.g. a model would not capture strong verb and strong noun inflection in Ger- 
man at the same time, but two independent models would each apply to each 
phenomenon). 

I propose that analogical classifiers do not operate on a multiple category basis. 
Instead, classifiers operate on a type by type basis. For each type, its classifier says 
what the phonology and semantics of the items that belong to that type must be 
like. This means that classifiers are not multinomial, but binomial. This is a new 
view of analogy. Usually, analogical classifiers are understood as systems which 
assign a category to an item based on their phonology and semantics. This is the 
effect we see. But, if we want to properly integrate analogy into the grammar, 
we need to decompose its classifiers into multiple binary classifiers*. A toy exam- 
ple with the irregular English verb classes already mentioned can help illustrate 


"This is not too different from what multinomial regression models do. From a computational 
perspective, whether one trains the models as individual binary classifiers or as one big multi- 
nomial classifier makes no real difference. However, because directly training multinomial 
models is much simpler, I will take this route when implementing the analogical models. 
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this crucial point. The example in Figure 3.3 shows how a multinomial classi- 
fier works: The classifier takes a word, and for the word it decides what class it 


belongs to. 
"s class-A — ee 


verb —————> CLASSIFIER > class-I—> A 


CH class-dt 


Figure 3.3: Example of multinomial classifier 


In contrast, the example in Figure 3.4 shows how a binomial classifier works: 
For each of the three classes: class—a, class—b and class—c, an analogical classifier 
decides whether a given verb belongs to said class or not. 


o true 
verb —— — ——35 CLASSIFIER dass A >æ 


~~ > false 

o true 
verb — — ——9 CLASSIFIER dass 3 >a 

~~ false 


o true 
verb —————> CLASSIFIER dass dt 


| —^ false 
Figure 3.4: Example of binomial classifier 


This approach restricts analogical models in several ways. First, because this 
model is strictly based on lexemic organization (that is, not on fully inflected 
words)? analogical models cannot target morphological features on their own. 
For example, under these assumptions, no analogical model could classify dative 
vs. accusative nouns, or distinguish between a diminutive and an augmentative. 
These are features determined by morphological processes, not by the hierarchy 
of the lexicon (but compare Koenig 1999). This restriction is one of the key dif- 
ferences with respect to word-based models that employ analogy for identifying 
and analyzing fully inflected forms. 


"One could, of course, expand this model to also operate on inflected words. However, the 
implications of such a model are unclear. 
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There are several implications of the arc. First, if analogy is restricted to the 
hierarchy, it means that analogy is always categorical. Non-categorical usage 
preferences are a separate phenomenon. Most important, however, is the claim 
that analogy runs through the hierarchy. If this is the case, we expect to see clear 
reflexes of the structure of the hierarchy on the analogical relations between 
lexemes. This is the main prediction of the arc. 


3.2.1 Analogy is categorical 


There seems to be some degree of implicit (and sometimes more explicit) assump- 
tion that analogy (not only in the sense of analogical classifiers, but also in pro- 
portional analogies) is fuzzy or similar to soft, violable constraints. For example, 
Matthews (2010: 880), speaking about gender assignment in French, claims "since 
the cues studied, especially the phonological, tend to be probabilistic and, hence, 
capable of violation, it is not surprising that connectionist models [...] have of- 
ten been to the fore in such machine learning work since they implement general 
statistical principles and allow for ‘soft constraint’ satisfaction”. In this case, it 
is not clear what probabilistic is supposed to mean. Connectionist models are 
not probabilistic, they are statistical. In neural networks, there are no stochastic 
processes”, and outcomes are never probabilistic (although they may be proba- 
bilities)’. A neural network trained to predict the gender of French nouns will 
always give the same prediction for the same input. From the very same domain 
of French gender assignment, there seems to be some evidence that the (ana- 
logical) process by which speakers decide the gender of new French nouns is 
deterministic. Studies in which native speakers have to decide on the gender of 
new words have usually found high degrees of interspeaker agreement (Tucker 
et al. 1968; 1977; Holmes & de la Bátie 1999). 

Analogical classifiers are not (or should not be) gradient or fuzzy. They should 
predict class membership categorically. However, this does not mean there is no 
room for gradience in the arc model. Gradience can be seen in usage especially 
when given two grammatical choices, speakers tend to prefer one over the other, 
or there are contextual cues which correlate with the alternatives. The degree 
to which one of the alternatives is preferred over the others is gradient because 
it does not consist of some categorical property but lies on a continuum. This 


*Technically, nodes in a neural network start in a random activation state, but this initial state 
has little impact on the final weights. 

"Systems like stochastic OT (Boersma 1997; 1998; Boersma & Hayes 2001) do work stochasti- 
cally, in the sense that there is a probabilistic process at work, and the outputs it produces are 
distributed according to some density function. 
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kind of phenomenon has been studied extensively in corpus linguistics (Bresnan 
et al. 2007; Bresnan & Hay 2008; Francis & Michaelis 2014; Hay & Bresnan 2006; 
Kapatsinski 2012). The role of analogical classifiers here is to determine what the 
grammatical alternatives are, but speakers can have different preferences with 
regard to these. Of course, there are cases where speakers are unsure about new 
lexical items, or where different speakers do not agree on the classification of 
some wug. At least two different explanations could be behind these phenom- 
ena. One case arises if an analogical classifier finds all classes are inadequate for 
an item because the item does not fit into any class. If an item does not fit any of 
the possible classes well, it is natural that speakers will have trouble categoriz- 
ing it. Another scenario causing uncertainty in categorization occurs when an 
item is assigned to two incompatible classes by the analogical model. If a new 
item matches two incompatible types (e.g. two different genders), there will be 
uncertainty about the class the item should belong to. 

A potential concern regarding binary analogical classifiers (ie. classifiers 
which only return true or false) is that they could produce multiple class assign- 
ments. In a case with two types r and ø, if the classifier that says which items are 
allowed to be t cannot see what the classifier for c does, one could expect that 
there would be many cases of multiple assignments, since both classifiers could 
allow for some lexeme to belong to both r and c. This is not a problem. The fact 
that a classifier allows some item to belong to multiple classes does not actually 
mean that, in the grammar, the item will belong to multiple classes. Classifiers 
are not responsible for final class assignment, they simply say whether a lexeme 
could potentially belong to some type, not that it has to belong to that type. A 
word like nieve 'snow' in Spanish could be either masculine or feminine from its 
phonological and semantic properties, but it is feminine for all speakers. The fact 
that analogical classifiers set up this way could produce multiple class assign- 
ments not found in the grammar is not a real issue. In other words, analogical 
classifiers do not say that lexemes with certain phonological and semantic prop- 
erties must belong to some type r, but rather that all lexemes that belong to type 
T must fulfill the aforementioned phonological and semantic properties. 


3.2.2 Analogy runs through the hierarchy 


That analogy runs through the hierarchy is the main claim of this book, and 
most of the case studies in Part II will focus on providing evidence for this claim. 
If analogical models are restricted by the inheritance hierarchy, and analogies 
themselves are constraints attached to specific types, then we would expect to 
see reflexes of the shape of the hierarchy in the analogical relations. 
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Although previous work on analogical classifiers seems to make this assump- 
tion in some way, it has never been stated explicitly. Analogical models are al- 
ways proposed and trained for distinguishing types in direct paradigmatic oppo- 
sition. There are no analogical models that distinguish between intransitive verbs 
and feminine countable nouns. Models for predicting gender are assumed to only 
predict gender, models for distinguishing verb inflection classes are assumed to 
only predict verb inflection classes, etc. This is not because of a limitation of 
the statistical methods used, since neural networks and AM could be trained 
to do this. Analogical models are not trained to do this because, intuitively, it 
would make no sense. Constraining analogy to the hierarchy straightforwardly 
accounts for why this is the case. 

This account has one direct consequence. If we accept that analogical models 
help to predict types in the hierarchy, there is no reason to think that analogical 
models can only predict the most specific types. Suppose an analogical model 
could discriminate between the four leaves (X, Y, Z, W) in Figure 3.5. 


class 


X Y Z W 


Figure 3.5: Basic hierarchy example 


In such a case, the analogical model would also be equally capable of distin- 
guishing between the intermediate types t and c (it simply has to map X, Y — r, 
Z, W — 0). This would also be true of any grouping we make of X, Y, W and Z, 
not only grammar-based groupings. However, if analogy is directly linked to the 
types in the hierarchy, we expect that types t and o may have analogical con- 
straints of their own, which means that necessarily X and Y have to share some 
constraints not found in Z and W, and similarly, Z and W will share constraints 
not found in X and Y. This has the implication that leaf types will be more similar 
to each other if they share a common super-type. 

This might sound radical, but it is not. It is just the logical extension of an ana- 
logical classifier that works on leaf types. In an analogical classifier of genders, 
we assume that two feminine nouns will be more similar to each other than to 
masculine nouns because they are both under the same type feminine. The claim 
the atc hypothesis makes is that there is nothing special about leaf types, and 
that exactly the same relations hold for more abstract types. There are no ad- 
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ditional assumptions involved in this proposal, there are no UG requirements, 
and there are no major incompatibilities with other theories of grammar. The 
relevant inheritance hierarchies follow directly from observable morphological 
behaviour, and the analogical constraints follow directly from observable phono- 
logical and semantic features. 

There are several shapes hierarchies can take (see next section). Depending 
on the exact form of the hierarchy describing some morphological process, we 
expect to see very different effects from the analogical relations. Part II of this 
book presents several case studies from different languages that try to exemplify 
what happens in different systems, and show that the predictions of the arc hold 
in every case. 


3.3 The (semi-)formal model 


In any theory with a type system, the type hierarchy has to fully specify what 
the type of each individual object in that hierarchy is. All sub-type relations are 
fully listed. For a given type r, the list of objects of this type must be specified 
Io, br, cu... L From a morphosyntactic perspective, t specifies those features 
shared by all items of type t. For example, v can specify [Pos noun], and thus 
all lexical items of type v will also share this feature. 

There is nothing that prevents a type from also specifying phonological (and se- 
mantic?) features. This means that 7 could specify that [PHON /épt/]?. This would 
mean that all lexical items of type t have an initial /pt/ cluster. Notice that some 
sort of phonotactic constraint must be in place in any case. All lexical items in a 
language must abide by phonotactic rules. Similarly, we can claim that 7 can im- 
pose phonological (analogical) constraints. Analogical constraints rarely apply 
to all items of a certain type, but rather build subgroups within some type. For 
example, Colombian Spanish words may begin with either full vowels or conso- 
nants, but not glides. This constraint, in a theory like OT, could be written as 
*JW-ONSET, but it can also be written as a disjunction of what is allowed: /£C/ v 
/#V/. Assuming that this constraint is in some general type shared by all words in 
Spanish, then all words would necessarily have to start with either a consonant 
or full vowel. That is, for a lexical item w to belong to r, it must satisfy one ofa set 


*Properly specifying semantic features is much more complex than specifying phonological 
features. For this reason, all examples presented here only list phonological constraints, but, 
in principle, the same can be done for semantics. 

TT will use phonological notation, with + marking word edges, as a shorthand for: </pt/> e nelist 


45 


3 Modelling analogy in grammar 


of constraints specified in 7.1% I will call these constraints analogical constraints 
if they help discriminate between two or more classes. 

To give an example from Sanskrit. The nominal inflection in Sanskrit has five 
classes (Whitney 1986): a-stems; i- and u-stems; (long vowel) i-, ú-, and d-stems; 
r-stems; and C-stems (consonant stems). Table 3.1 presents the paradigm of a- 


stems and C-stems. 


Table 3.1: Sanskrit inflection classes according to Whitney (1986) 


a-stem, kama- ‘desire’ 


Singular Dual Plural 
Nominative ` kàm-as kàma-u káma-s 
Accusative kàm-am  káma-u káma-n 
Instrumental kàm-ena  káma-bhyam kama-is 
Dative kàm-àya  káma-bhyam kame-bhyas 
Ablative kam-at kama-bhyam | kàme-bhyas 
Genitive kàm-asya  kàma-yos kama-nam 
Locative kam-e kàma- yos kame-su 
Vocative kam-a kama-u káma-s 

C-stem, vak- ‘voice’ 

Singular Dual Plural 
Nominative — vàk-O vàc-au vàc-as 
Accusative vac-am vac-au vac-as 
Instrumental vàc-àá vàg-bhyam ` vag-bhis 
Dative vac-e vag-bhyam ` vag-bhyas 
Ablative vac-as vag-bhyam ` vag-bhyas 
Genitive vac-as vac-os vac-am 
Locative vac-i vac-os vak-su 
Vocative vak-® vàc-àu vàc-as 


The individual exponents of these conjugations are not too important here; the 


important point is that these are different enough for both classes to take expo- 
nents which are too different to have a purely phonological explanation. In other 
words, it is unlikely that the exponents of cells like the genitive singular -sya and 


From the previous discussion it should be clear that ultimately, the notation system and the 
technique we use to specify the analogical relations are of secondary interest. Any of the ap- 
proaches described in the previous chapter should work with this system. 
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-as are phonologically derived from each other. These really are different mark- 
ers that target different inflection classes. The important point is that Sanskrit 
requires at least five inflection class types (subdivisions within these five types 
are very likely necessary), to which nouns must belong. Thus, the generaliza- 
tion about the ending of the stems according to inflection class is an analogical 
constraint in the sense used in this book. 

In terms of analogical constraints, a noun belongs to class a-stem if its stem 
ends in /a/ or /a/, that is, a-stem nouns in Sanskrit must satisfy: /a#/ v /a#/, while 
C-stem nouns must satisfy: /C&/.!! 

Types can also specify negative constraints on what is disallowed. This follows 
because a negative constraint like =/4p/ would be the product of a disjunction of 
positive constraints: /£a/ v /#b/v ..., missing /£p/. Negative constraints are useful 
in cases with default types that exclude a very specific set of lexemes (as shown 
below). 

To sum up, so far we have the general setup for integrating analogy into 
the grammar: types have analogical constraints associated with them, which 
members have to satisfy. Additionally, a type c, sub-type of r, can specify fur- 
ther analogical constraints its members must have. There are two alternatives at 
this point. We could either postulate a unification-based system where the con- 
straints in T and c are unified to build a more complex constraint, or we can 
simply specify that inheritance is given by an ^ relation between the set of con- 
straints in 7 and c, and use a boolean algebra. I will pursue the second option in 
this section, but either approach would work. 

Ihave been using simple phoneme-based representations for PHON constraints, 
but these could take many different shapes and forms. These constraints could 
be based on feature decomposition, or on distance from a set of prototypes of 
the class. That is, a constraint could say that any lexeme of type v must not be 
too different from some prototypical lexeme (or set of lexemes) w. Constraints of 
this type could take the following form: 


UU [PRON fy (w)«n] 


where f, is a function which measures the distance of w from the prototype p, 
and nis a set threshold. There are multiple ways of measuring distances between 
strings (e.g. the Levenshtein distance Levenshtein 1966), but the distance could 


"The constraint /C#/ could be further decomposed into all the actual consonants a noun of the 
C-stem declension can end with. Alternatively one could use feature decomposition and claim 
the constraint targets [+cons]. 
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also be based on perception, i.e. what speakers perceive to be similar or differ- 
12 
ent. 
We can define inheritance of analogical constraints as follows. If o is a sub- 
type of 7, then 


(12) if constraint c holds of type v and c is a sub-type of t, then constraint c 
holds of o 


This can be easily extended to multiple inheritance: 


(13) if constraints o and c; hold of types tı and rs, respectively, and o is a 
sub-type of both r, and tz, then constraints c, and c, also hold for c. 


To model special cases and exceptions, we only have to add the full phonolog- 
ical specification of said exception. If, in a toy language, v allows words starting 
with a dental or the stem paner, it would specify the constraints: /#t/ v /sd/ v 
/#paner#/. This straightforwardly accounts for productivity issues. New items of 
type 7 can only start with t or d, but there is the exception paner. We find such an 
instance in the German gender system, where nouns ending in /aj/ are feminine. 
A few exceptions are words like Ei (/aj/, ‘egg’), or Blei (/blaj/, lead”). This means 
that neuter would specify [PHON (...v /faj&/ v /blaj&/ v ...) ^ —/.aj&/] (where the *' 
stands for any phoneme), including the exceptions to the /aj/ pattern.*-1* And, 
similarly, from a semantic perspective, in German all alcoholic drinks take the 
masculine gender except for Bier ‘beer’, which is neuter. 

A different kind of special case is that of default types. Default types, with 
regard to analogy (which may or may not overlap with morphological defaults), 
are types where remaining cases land. This situation occurs where a series of 
types have strict analogical constraints, and one type which allows for every item 
which does not fit well into any of the other types. However, a default type situa- 
tion is only a particular distribution of analogical constraints and not something 
especially coded into the system. This can be illustrated with some toy examples. 
Suppose there are two types in competition: o [PHON /#C/] and 7 [PHON /#V/]. 


“Unlike measures such as the Levenshtein distance, perceptual distances factor in the relative 
prominence of different phonemes, among other things. For example, confusion between /t/ 
and /d/ might be much higher than confusion between /k/ and /g/ in some languages, despite 
the fact that these pairs of phonemes only differ in one feature. 

PItis worth noting here that the /aj/ string is not always part of the lexeme but it can be a gender 
assigning suffix. 

“Notice that listing /#aj#/ and /#blaj#/ in the set of possible phonological shapes for neuter 
nouns does not ensure on its own that there will be two neuter nouns with phonology /#aj#/ 
and /#blaj#/, it only means that these are possible shapes neuter nouns can take. 


48 


3.3 The (semi-)formal model 


In such a case it makes no sense to talk about a default distribution because the 
analogical constraints are complementary and (depending on the phonotactics 
of the system) do not say anything about which of the two types will likely have 
more items. Now suppose that the types in competition are: o [PHON /#k/] and 7 
[PHON /#C/]. In such a case o can only accept items which begin with /k/, while 
T can accept any item which begins with a consonant (including /k/). We can 
say then that r is an overlapping default that accepts every item, including items 
which could belong to c. Finally, suppose now that the constraints across both 
types are as follows: o [PHON /#k/] and 7 [PHON /#C/ ^ ^ /4k/]. In this case, risa 
non-overlapping default that only accepts remaining items that do not belong to 
c. These are only the three basic cases, and complex combinations of these three 
cases may be at work within a system. For example, in a case with three types r, 
c and y, T may be a non-overlapping default with respect to c, and at the same 
time r may be an overlapping default with respect to y.” 

We expect that in a system, the types that have the highest number of mem- 
bers will also have the least number of constraints. Having less strict analogical 
constraints means allowing for more different items. 

An important feature of this system is that there are no statistics directly as- 
sociated with any of the analogical constraints. Statistical systems can help us 
infer the constraints and find patterns of preference, but this is independent of 
the actual grammar. It is irrelevant how many feminine nouns in German end in 
/aj/, since the constraint is categorical. Actual numbers and proportions proba- 
bly play a role in language acquisition but are not really relevant for the formal 
grammar specification. For example, in German there is a statistical preference 
for nouns ending in /e/ to be feminine, but nouns ending in /e/ do not have to be 
feminine. This means that all genders in German have the constraint /e#/ (i.e. no 
gender in German has the constraint —/e#/). 

With the model in place, we can calculate the predictions of different hierarchy 
shapes. In a simple tree-like hierarchy, as in Figure 3.6: 

we expect that, if non-o has any analogical constraints, items that belong to 74 
will share more features with items that belong to 1, than to c. This is because 
7; and 7 have to satisfy any analogical constraints in non-o, while items in o 
do not. However, if non-o has no particular analogical constraints, then we do 
not have any particular expectation regarding what we should see in terms of 
similarity between these three leaf types. 

In a case of multiple inheritance as in Figure 3.7, we expect that the y type will 
look like both t and c, but t and c will share less. This is because y is stricter in 


This follows if, e.g.: 7 [PHON /#C/ ^ 7 /#k/], o [PHON /#k/], and y [PHON /#1/] 


49 


3 Modelling analogy in grammar 


T 


"din 


o non-o 
Yy Y m 7 
Figure 3.6: Simple inheritance hierarchy 


its analogical constraints. Only those constraints which are compatible between 
T and X will be available for items belonging to y, while all constraints on T are 
available to t and all constraints on 2 to c, and since these need not overlap, it 
is easier for r and c items to be different from each other. 


A 
A 
> T 
ud OREL A 
c Y T 


Figure 3.7: Multiple inheritance example 


Although the predictions are clear, we cannot expect a perfect correlation be- 
tween the observed analogical relations and the shape of the hierarchy in all 
cases. There are several factors that can give rise to mismatches. First, the exis- 
tence of overlapping default types will cause confusion between the default type 
and all other sister types, independently of hierarchy. If t and c are sister nodes, 
and 7 has a constraint such that [PHON /a#/], while o has none, both types will 
allow words ending in a. The second reason is that transparent types will result 
in effectively flat hierarchies. A transparent type is a type that imposes no ana- 
logical constraints. If, for example, in Figure 3.6, non-o has no constraints of its 
own, for analogy it is as if all three leaf types in the tree were at the same level, 
and thus only the specific constraints in c, 7, and 7 will play a role. 

A final advantage of this model is that it is learnable and thus compatible with 
usage-based approaches to language. Although it makes use of abstract types, 
these follow directly from the surface inflectional or derivational behaviour of 
lexemes. Because analogical constraints associated with abstract types must be 
inherited by more concrete types, they are also visible on the surface of words. 
The toy hierarchy in Figure 3.8 shows a simple example of how this works. The 
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words on the leaf nodes directly instantiate the constraints in their super-types, 
and these in turn instantiate the constraints in A and II. 


T 
A I 
[PHON /V£/] [PHON /C#/] 

L a € p ó À 
[PHON /i£/] [PHON /a#/] [PHON /e£/] [PHON /v#/v /b£/] [PHON /t#/] [PHON /I#/] 
kani ptara ipe prav narat orol 
li llanna arine lab ot sil 


Figure 3.8: Complete hierarchy example 


It is easy to see that in a case like Figure 3.8, all analogical constraints on 
the mid-level types are directly the product of generalizing across the leaf types, 
just as the constraints of the leaf types are the product of generalizing across 
the lexemes. Real world examples are not as simple, but should follow the same 
pattern. 

There are some possible objections to the claim that such a model is usage- 
based compatible. For example, Eddington (2009: 428), discussing analogical clas- 
sifiers, claims that “in analogical models words are not parsed into morphemes, 
but stored as wholes”. At first sight this seems incompatible with the idea that 
the lexicon organizes lexemes, and not inflected forms. However, both views are 
possible. It is likely that speakers store fully inflected items, and keep track of 
most items they encounter (De Vaan et al. 2007); however, this does not entail 
that speakers store unanalyzed items. Rather, there is evidence to the contrary 
(Roelofs & Baayen 2002). In a usage-based model, speakers can store all inflected 
forms they encounter, but still organize lexemes according to their inflectional 
and derivational behaviour. 


3.4 Final remarks 


In this chapter, I have proposed a model that can help answer the open questions 
of how analogy interacts with grammar in a way that makes it compatible both 
with (several) grammatical theories, and also with most assumptions from usage- 
based linguistics. The claim of the arc model is that analogical classification is 
closely linked to the hierarchy, and thus it reflects aspects of the organization 
of the lexicon. This view produces a system in which analogy is categorical and 
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operates on a type by type basis.!° In Part II, I present evidence from various lan- 
guages and phenomena that show strong support for the model proposed here. It 
is important to note, however, that the semi-formalization of the previous section 
is not a requirement for the thesis of this book. The empirical results presented 
in Part II are the main contribution of this work. 


Notice that this model does not imply that the hierarchy comes first, and then analogy attaches 
to it or the other way around. This model is completely silent as to how both analogy and the 
hierarchical organization of the lexicon are acquired. It is my hope that different models of 
language acquisition should be compatible with it. 
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In this chapter I present, somewhat informally, the statistical and data visual- 
ization tools I will use in the rest of this book. Readers interested in the mathe- 
matical details should consult the cited sources, since detailed descriptions and 
explanations of these techniques would merit a short book and a strong statisti- 
cal background. My intention is only to provide the reader with a good intuition 
of what these methods do. 


4.1 On the general methodology 


Many of the phenomena I discuss in the following case studies have been studied 
exhaustively before, and it is not my intention to develop complete analyses for 
any of these cases. Moreover, in several instances, I will use sub-optimal models 
that ignore semantics or other possible strong predictors, this should only make 
the main point stronger: formal analogy occurs even in unexpected cases, and 
it follows the grammatical hierarchy of the language. Similarly, I do not provide 
full formal linguistic analyses, but rather only sketches to motivate a plausible 
type hierarchy. It is my intention that the ideas proposed in this book can be 
formally implemented in different linguistic theories (Construction Grammar, 
Cognitive Grammar, Paradigm Function Morphology, HPsG or similar). This is 
why theoretical assumptions are kept to a minimum. 

I make no strong claims concerning the psycho-linguistic reality to these mod- 
els. The fact that we can predict, to a greater or lesser degree, word classes from 
formal properties of words, does not mean that speakers necessarily do the same. 
It is possible that the way speakers perform class assignment in some of the lan- 
guages studied has some parallels to the models proposed here, but it is also 
possible that speakers do rely on different aspects of cognition. These are related 
but independent questions. The patterns I will present could be productive, or 
vestiges of previous systems, but not any less real. I will, however, make some 
connections with some ideas about cognitive aspects of language during the final 
discussion. 

There are two main reasons for the choice of languages in this study: theoreti- 
cal relevance and data availability. I have not tried to compile a representative ty- 
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pological sample. Far from it, most examples are from Indo-European languages 
of various subfamilies, and only a few are taken from African languages. The 
analogical models require an electronic, morphologically annotated dictionary, 
which are still very rare for languages spoken by smaller language communities. 
The theoretical relevance relates to the classes a language has and how they are 
organized. 


4.2 Statistical models and methodology 


For all the cases to follow I use the same general method for building the ana- 
logical models. From the stem of the words (nouns or verbs) I extract predictors 
which might be at play in the analogical relations and fit a neural network with 
the nnet package (Venables & Ripley 2002)! in R (R Development Core Team 
2008). The use of neural networks has a long history in linguistics, and they are 
usually linked to connectionist models (Bechtel & Abrahamsen 2002; Churchland 
1989; McClelland & Rumelhart 1986; Rumelhart & McClelland 1986a). However, I 
do not make any claims about the underlying linguistic system, or the rightness 
or wrongness of connectionism. The use of neural networks for the following 
analogical models is purely practical. Similar effects could probably be achieved 
using different algorithms like Random Forest (Breiman 2001) or Support Vector 
Machines (Smola & Schólkopf 1998; Scholkopf & Smola 2001). For the present 
book, the actual technology is not important, only the concept behind it?. My 
aim is to show that prediction is possible, not to find the best possible method. 
The stems in the models are not theoretical objects, and the ideas in these 
models should be compatible with word-based models. The idea is that there is 
a distinction between the phonological material that expresses some property 
like MASCULINE, and the phonological material that expresses the meaning ‘cat’. 
I take the stem to be the full word minus the phonological material that marks 
the category at hand. In a trivial Spanish example, the stem of gato ‘male cat’ 
is /gat/, since the /o/ segment is the gender correlate, and we have the oppos- 
ing form gata 'female cat'. For many non-trivial cases some compromise had to 
be reached, and it will be described in detail. Crucially, this approach does not 
consider underlying representations, only surface forms. Of course, one could 


'For all models I used the softmax linking function and the maximum number of weights and 
iterations set in a way that the models converge. Whether a hidden layer was present or not, 
and the number of hidden units, varied from model to model. 

“The next subsection provides a simple illustration of how the analogical models work, but 
the interested reader should consult Venables & Ripley (2002) for a rigorous mathematical 
explanation of neural networks and the nnet package. 
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compare the results of a model based on some sort of underlying representation 
with the results of more surface oriented models. 

The predictors used to fit the analogical models vary slightly from model to 
model, but they always contain phonological information about the shape of the 
word. The most straightforward way to do this is to simply take a set number of 
letters at the end (or beginning) of the stems and use them (together with their 
position) as predictors. What the positional information does is to make a distinc- 
tion between, say, a t at the end of the stem and a t in the third to last position. 
This way of specifying phonological shape has advantages and disadvantages. A 
good aspect of this approach is that the model can, on its own, infer classes be- 
tween phonemes as represented by letters. If x and ^ share some phonological 
feature which makes them into a natural class, and are thus predictive of the 
same inflection class or gender, the model will simply assign the corresponding 
weights to said inflection class. This means that we do not need a rich phonolog- 
ical representation to arrive at phonological analogies. Another possible issue, 
which might unfairly benefit or harm the models, is that in cases of low corre- 
spondence between the phonology and the orthography, certain spelling rules 
might contain some additional information not directly available to speakers, or 
some important information might be missing. There is no easy way to solve this 
problem, short of using detailed phonological transcriptions, which are unavail- 
able for most ofthe languages under consideration. Any sort of phonological pro- 
cess like methathesis, which could be easily captured by a rule-based approach, 
will be invisible to the model, thus reducing the amount of information available. 
To reduce loss of information due to some spelling systems representing a sin- 
gle phoneme with a character sequence (e.g. Spanish and German), I simplified 
spelling assigning special characters to those regular sequences. Some phonolog- 
ical information is, however, non-recoverable from the orthography (e.g. some 
vowel length/quality information in German, or the difference between long and 
double vowels in Kasem). 

To prevent overfitting? the models I apply ten-fold cross-validation to every 
model. This is done by splitting the dataset into ten groups. The general model is 
then fitted using nine of the groups as training data and testing the predictions 
of the model on the group not used for fitting it. The process is repeated for 
each of the ten subgroups. This way we can look at all the data while preventing 
overfitting (Kohavi 1995). 


3Overfitting happens when models predict the same items they learned from. This is a problem 
because if a model is overfitted, it does not really tell us much about how good the predictors 
are on novel items. 
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In Section 2.2, I discussed four possible ways of implementing analogy, and 
argued that the difference is a gradient rather than a truly categorical one. The 
present models fall somewhere between a weighted multiple-rule-based model 
like those presented by Albright & Hayes (2003) and Albright (2009) and a purely 
stochastic model like NDL (Arppe et al. 2018; Baayen 2011; Baayen et al. 2011) or 
AM (Skousen 1989; Skousen et al. 2002; Skousen 1992; Arndt-Lappe 2011; 2014). 
The difference between the present model and a weighted rule-based model is 
that I consider all possible patterns within some structurally defined positions in 
the word (e.g. the last two segments, the last consonant, the number of syllables, 
etc.), and do not attempt to predefine the rules of the model, or decide to include 
or exclude some patterns. The difference to a completely stochastic system lies in 
the same property: the current model is sensitive to structural properties of the 
lexemes it sees, while NDL and AM are "blind" or completely amorphous. Like 
AM, and unlike some previous connectionist systems (Bechtel & Abrahamsen 
2002; Churchland 1989; McClelland & Rumelhart 1986; Rumelhart & McClelland 
19862), the analogical model used here sees linguistically defined categories as 
the outputs. In traditional connectionist systems the networks directly paired 
semantics to sounds (Matthews 2005). 

The similarities between these kinds of systems have been observed before: 


Connectionist networks themselves further illustrate the problem, in that 
they might be seen to fall in both camps. Back-propagation networks are 
often described as depending on similarity...However, they are also often 
described as using 'implicit rules' which can be extracted using appro- 
priate analysis... Therefore, back-propagation networks appear rule- and 
similarity-based (Hahn & Chater 1998: 200) 


In any case, it should be clear that I am not arguing for neural networks as 
a necessarily better implementation of analogical systems, or as a psychologi- 
cally plausible system. Neural networks as used here are just one of the many 
alternatives we have to model analogy. 

All this being said, a more clever and carefully designed model similar to the 
weighted multiple-rule-based model like those presented by Albright & Hayes 
(2003) and Albright (2009), or that of Beniamine & Bonami (2016), would proba- 
bly perform better for any particular case and be more psychologically plausible. 
These models have some downsides, however. The most important one is that 
they require much better structured datasets, with complete phonological infor- 
mation. This requirement is harder to fulfill than the rough, semi-phonemic tran- 
scriptions required for the neural network models. A second difficulty of these 
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models is that they are extremely slow to fit because the rule inference step is 
computationally intensive. This makes it impossible in practice to fit many dif- 
ferent models for each case and hence impractical to test various combinations 
of predictors. 


4.3 Analogical models using neural networks 


The easiest way to explain the intuition behind the models, and the tools I use 
for evaluating them, is with concrete examples. Suppose a language has two in- 
flection classes, A and B. The dataset in (1) presents stems for lexemes belonging 
two said inflection classes. 


(1) a. A:lama, lara, lado, laso, pama, ra, dal, kar, tsar, sek, cess 


b. B: egrr, liz, lo, loi, lu, lip, roop, oppe, toi, olor, gin, grip, wik 


There is no single (simple) rule which can predict to what class a given lexical 
item belongs. However, intuitively, the first vowel seems to be a strong indicator. 
All items for which the first vowel is a belong to class A, while items for which 
the first vowelis i, oor u belong to class B. Items with e are found in both groups. 
Because there are only a few lexical items, and the pattern is fairly simple, this 
generalization is fairly evident, but in a more complex system it would be much 
harder. These observations could also be inferred with a statistical model. 

Put in simple terms, given a training dataset with items, and a series of predic- 
tors for each item, the neural network model learns from these items and assigns 
weights to the predictors. When presented with new data (the testing dataset), 
the network calculates from the weights the probability of each outcome for each 
item in the dataset. This is achieved the following way. The neural network sets a 
baseline for the prediction, based on one ofthe levels for each predictor. For each 
predictor, the first level (alphabetically) is chosen for the baseline node (in this 
case thus: s1-c, s2- a). This baseline has a weight for each outcome (the classes 
to be predicted). To each other level of each predictor, it assigns a weight for each 
outcome. The weights of the predictors are added to the baseline to calculate the 
probability of each outcome given some input. 

We can apply this to our previous example. We split the data set into a training 
and a small testing dataset. For the testing dataset we randomly select the items: 
lama, lara, kar, egrr, liz, oppe, grip, and we assign the rest of the items to the 
training dataset. For illustration, we can fit two different models. In Model 1, we 
set two predictors s1 and s2, which correspond to the first and second letter 
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in the items, respectively. For Model 2 we set v1 and c1, which correspond to 
the first vowel and first consonant of the items, respectively. We can then train 
Model 1 and Model 2. 

Figure 4.1 shows the structure of the neural network for Model 1. As can be 
seen, there is a direct connection between the predictors and the outcomes (a 
skip layer), and there are no intermediate steps (hidden layers). We can see how 
each letter predicts A or B. The thickness of the line represents the absolute value 
ofthe weight (thicker lines have larger absolute values), and the color represents 
whether the weight is positive (dark gray) and favors the outcome, or negative 
(light gray) and disfavours the outcome. In the node marked as B1 we have the 
baseline (the bias node) made up of the combination ofthe levels cfor s1 and afor 
s2^. This combination clearly favours A, as we would expect. If any of these levels 
changes in the input, then the nodes in the skip layer activate and counteract the 
baseline. If, say, the input contains an u, i or o instead of an a in s2, then the 
corresponding node will strongly activate the outcome B, as we, again, would 
expect from the data set. The complete set of weights from the inputs to the 
outcomes for Model 1 is given in Table 4.1. 

To calculate the actual class probabilities from the output weights, we use the 
softmax function. The intuition of this function is that, given a vector of weights, 
it will transform that vector into a vector of probabilities, where the element with 
the highest weight will receive the highest probability, and all probabilities add 
up to 1. The general form of this function is given in equation (2). In prose, we 
exponentiate each weight, and divide by the sum of all exponentiated weights. 


edi 
(2) S(yj) = ei 
J 


As an example, assume the weight vector [2, 1, 0.1]. Exponentiating each mem- 
ber we get the vector [7.4, 2.7, 1.1], and their sum is 11.21. Dividing the exponen- 
tiated weights by the sum we get the probabilities [0.66, 0.24, 0.1]. 

To know how well the model performs, we predict the outcomes of the testing 
dataset, build a confusion matrix and calculate different accuracy scores. The 
corresponding confusion matrix for this model is shown in Table 4.3. Here we 
see the predictions that the model made for each testing item. There were two 
errors in total: egrr and grip. It is easy to see why these errors happen: there are 
no comparable items in the training dataset, grip starts with a gr sequence and 


“The models chooses the baseline levels purely on alphabetical order. 
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Figure 4.1: Representation of Model 1 


59 


4 Methodological notes 


Table 4.1: Weight table for Model 1 


weight predictor response variable 


1 5.42 c-a A baseline 
2 3.43 d A sl 
3 -0.29 e A sl 
4 -321 g A s1 
5 0.58 k A s1 
6 2.89 1 A sl 
7 -7.13 o A s1 
8 437 p A sl 
9 1.89 r A sl 
10 2.24 S A sl 
11 0.81 t A sl 
12 -2.62 Ww A s1 
13 4.67 e A s1 
14 0.61 g A s1 
15 -13.22 i A s1 
16 -8.12 1 A sl 
17 -14.45 o A s1 
18 0.03 p A s1 
19 -0.33 r A s1 
20 4.25 S A sl 
21  -18.31 u A sl 
22 -5.43 c-a B baseline 
23  -3.88 d B s2 
24  -0.35 e B s2 
25 3.28 g B s2 
26 0.49 k B s2 
27 -2.19 1 B s2 
28 7.49 o B s2 
29 -4.96 p B s2 
30 -2.67 r B s2 
31 -2.02 S B s2 
32  -0.49 t B s2 
33 2.95 wW B s2 
34 -5.20 e B s2 
35 -0.48 g B s2 
36 12.77 i B s2 
37 8.35 1 B s2 
38 13.70 o B s2 
39 -0.38 p B s2 
40 -0.11 r B s2 
41 -3.81 S B s2 
42 17.85 u B s2 
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egrr is the only item with an e as first letter and g as second letter. 

Table 4.3 shows the confusion matrix for the predictions of Model 1, and Ta- 
ble 4.4 shows a matrix with the positions of True Positives (TP), True Negatives 
(TN), False Positives (FP) and False Negatives (FN). The TP and TN are cases 
where the class predicted by the model match the real class of the items. FP 
and FN are the cases where the class predicted by the model does not match 
the real class of the items. The total population N is the sum of all these values: 
TP + TN + FP + FN. 


Table 4.2: Predictions Model 1 


Predicted Observed Word 


1 A A lama 
2 A A lara 
3 A A kar 

4 A B egrr 
5 B B liz 

6 B B oppe 
7 A B grip 


Table 4.3: Confusion matrix for Model 1 


Reference 


Prediction A B 
A 3 2 
B 0 2 


Table 4.4: Diagram of True Positives, False Positives, True Negatives 
and False Negatives 


Reference 


Prediction A B 
A TP FP 
B FN TN 
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The accuracy is the number of correct predictions divided by the total num- 
ber of items. Additionally, we can calculate the confidence interval (CI) of the 
accuracy by using a binomial test (Clopper & Pearson 1934; Newcombe 1998). 
The No Information Rate (or accuracy of a model under a no information situ- 
ation) is calculated as the largest class percentage in the data. In this case, A's 
class percentage is 0.4286 and B's is 0.5714, thus the latter is taken to be the No 
Information Rate. In other words, the No Information Rate is the accuracy of a 
model that always predicts the most frequent outcome. In our example data B 
is the most frequent outcome. If the model predicted all outcomes to be B, then 
it would reach an accuracy of 4/7 = 0.5714. Models where all predictors have 
no information regarding the outcomes (i.e. they are poor predictors) tend to 
have an accuracy close to the No Information Rate, because always predicting 
the most frequent outcome guarantees the highest possible accuracy under a no 
information situation. The model is then said to perform above chance if the No 
Information Rate is less than the lower limit of the accuracy confidence interval. 

There are three additional statistical values I will use in certain cases are: Speci- 
ficity, Sensitivity and Negative Predictive Value. Specificity is the proportion of 
negatives that are identified as such (= TN/(TN + FP)), while sensitivity is the 
proportion of positives that are identified as such (= TP/(TP + FN)). The negative 
predictive value (= TN/(TN + FN)) will help us identify the class to which more 
items from other class are misclassified. These three statistics are not relevant for 
this particular example because we only have two classes here, but can be used, 
by class, in models with more than two outcomes. 

Finally, the kappa statistic compares the observed accuracy with the expected 
accuracy (under random chance). The expected accuracy is calculated as follows. 
We multiply the observed frequency of A by the predicted frequency of A, and 
the observed frequency of B by the predicted frequency of B. We then divide 
these numbers by N, add them together and divide again by N. Thus, we get: 


(IP+EN)(TP+EP) | (TN+FP)-(TN+FN) 
N N 


(3) Expected.Accuracy = X 


Finally to calculate the kappa statistic we use the following equation: 


observed.accuracy - expected.accuracy 


4 K z 
(4) appa 1 - expected.accuracy 
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Kappa scores? go from 0 (in a perfectly random model) to 1 (in a perfectly 
accurate model), a kappa of 0.5 is halfway between the expected accuracy and 1. 
The advantage of using kappa is that it tells us how well above random chance 
the model is performing, and, to a degree, it allows to make model comparisons. 
The disadvantage is that there is no standarized interpretation and no objective 
cutoff point. A model with a kappa of 0.2 is not inherently bad, nor can it be said 
that it is at chance level. However, we can say that a model with a kappa of 0.7 
is better than a model with a kappa of 0.5. 

Table 4.5 shows the relevant statistics for Model 1. In this case, because our 
dataset is so small, the model's accuracy can not be said to be better than chance. 


Table 4.5: Overall statistics for Model 1 


Overall statistics: 


Accuracy 0.7143 
95% CI (0.2904, 0.9633) 
No Information Rate — 0.5714 
Kappa 0.3593 


A second possible model for our dataset is to specify more linguistic informa- 
tion in the predictors. In Model 1 all we have is information about position of the 
segments, but not information about their nature. An alternative would be to set 
a model where the predictors are not selected by position only, but also by class. 
Instead of using the first and second letters in the pseudo words, we will now use 
the first consonant (c1) and the first vowel (v1). Figure 4.2 shows the structure 
of Model 2 as before. By selecting more structural predictors we have somewhat 
reduced the complexity of the model$, but the same generalization remains: the 
main predictor is the first vowel of the word. The full set of weights for the model 
is given in Table 4.6. 


$4942), 2292) 15 4 


0.7143-0.4694 


Thus, we have that kappa - ET HE 0.4615. Notice that the Expected Accuracy is different 
from the No Information Rate because the former is taken from a model that knows about the 
distribution of the outcomes in the traning dataset, while the latter is a completely random 
assignment of outcomes to inputs in the testing dataset. 

$Notice this is only the case because of the characteristics of this dataset. In more complex 
datasets a more structured model will usually be more complex than a less structured one 


because it requires more information. 
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Figure 4.2: Representation of Model 2 
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Table 4.6: Weight table for Model 2 


weight predictor response variable 
1 743 a-c A baseline 
2 11.87 e A v1 
3 -21.81 i A vl 
4 -24.18 O A v1 
5 -34.05 u A v1 
6 5.77 d A c1 
7 -666 g A c1 
8 0.34 k A c1 
9 6.80 1 A cl 
10 568 p A cl 
11 -2.33 r A c1 
12 5.48 S A cl 
13 -3.36 t A cl 
14 -8.98 Ww A cl 
15 -8.19 a-c B baseline 
16 -12.23 e B vl 
17 22.95 i B vl 
18 2547 O B vi 
19 3415 u B v1 
20 -5.20 d B v1 
21 728 e B cl 
22 0.09 k B c1 
23  -6.61 1 B cl 
24 -582 p B cl 
25 2.02 r B c1 
26  -6.57 S B cl 
27 2.48 t B cl 
28 9.22 w B c1 
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In Table 4.7 we see now the results of the predictions. This time the model only 
made one mistake: egrr. The reason why grip is correctly classified this time is 
that the model finds it similar enough to liz, lip and gin, because it now knows 
what its vowel is. Trying to reconstruct the evaluation of egrr is instructive. For 
obtaining the weight for A we add to the baseline (7.43) the weight for cl=g (-6.66) 
and v1=e (11.87), which gives us 12.64, and we do the same for B (-8.19-12.23+7.28) 
and we get -13.14. This clearly makes A win, but the node for c1 was pulling, 
in both cases, for B. This means that even though the model made the wrong 
choice, it did see a similarity between egrr and other B items (namely having g 
as its first consonant). This can be seen in the probabilities in Table 4.7. Of those 
items classified as A, eggr had the highest (even if small) probability of belonging 
to class B. I will use this aspect of the analogical models in the next chapters to 
measure similarity between classes. 


Table 4.7: Predictions Model 2, including the probabilities for class A 
and B 


Predicted Probability A ` Probability B Observed Word 


1 A 9.99e-01 3.25e-07 A lama 
2 A 9.99e-01 3.25e-07 A lara 
3 A 9.99e-01 5.20e-07 A kar 
4 A 9.99e-01 3.18e-06 B egrr 
5 B 5.14e-05 9.99e-01 B liz 

6 B 1.83e-04 9.99e-01 B oppe 
7 B 5.55e-09 1.00e+00 B grip 


The corresponding confusion matrix and statistics for Model 2 are given in 
Table 4.8 and Table 4.9, respectively. 


Table 4.8: Confusion matrix for Model 2 


Reference 
Prediction A B 
A 3 1 
B 0 3 
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Table 4.9: Overall statistics for Model 2 


Overall statistics: 


Accuracy 0.8571 

95% CI (0.4213, 0.9664) 
No Information Rate — 0.5714 

Kappa 0.72 


One final important point is that we see how the CI and the Kappa are partially 
independent of each other. In Model 2, we obtained a Kappa of 0.72, which is 
considerably higher than what we would get in a random model, but because 
this metric is not sensitive to sample size, it fails to take into account the fact that 
there are only seven observations. The CI information, on the other hand, does 
take this into account and rightly tells us that we cannot draw any conclusion 
from this tiny dataset. I will use both metrics together when evaluating models. 

The models in the following chapters are too large and complex to either plot, 
or explore by hand. For this reason I will only make use of confusion matrices 
and accuracy scores to evaluate them, but in principle it would be possible for 
someone to inspect any of the analogical models presented here. 


4.4 Measuring variable importance 


An issue with neural networks is the fact that it is relatively difficult to inter- 
pret the exact importance that the different factors have on the overall model. 
Unlike linear or logistic regression, we cannot directly explore the coefficients. 
However, in some cases, it is important to understand which factor plays a more 
or less important role predicting some dataset. To address this question we can 
make use of additive and subtractive modelling. The idea is very simple. For sub- 
tractive modelling we start with the complete model (with all predictors), and 
we compare its accuracy and kappa scores to models leaving one predictor out. 
This technique allows us to compare the relative importance of each individual 
predictor in the context of the complete model, each individual predictor is. The 
additive variant of this idea consists of starting with a null model without predic- 
tors, and one by one, adding the original predictors back and comparing at each 
step the accuracy and kappa scores. 
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4.5 Clustering and distances between classes 


The second important method I will use throughout this book is that of clustering 
and measuring class similarity. Imagine the new made up set of stems whose 
inflectional class we want to predict as shown in (5): 


(5) a. A:lama, lara, lado, laso, tama, ga, gal, tar, tsar, tek, tess 
b. B: egrr, liz, lo, loi, le, lep, loop, olpe, toi, olor, gen, grap, tak 
c. C:yrro, yrto, yro, undo, ujo, jyr, juk, juz, ryk 


In this new example we have three classes (A, B and C) which are easily de- 
scribed in terms of their first vowels and consonants. Words in A have an a, or 
an e preceded by a t, s or g. Words in B have an o, i, a, or an e not preceded by a t 
(except for tak). Words in C have a y or u, usually with an r or j. Additionally, we 
can observe that there is a much greater similarity between A and B, than C to 
the other two. Classes A and B can appear with an / or e, and to a lesser degree 
t or g, while C does not. 

We can fita new model using again the predictors c1 and v1 to this new dataset 
(Model 3), and because the system is much more regular now, it should predict 
perfectly the class of an item. What we really want to achieve now is measuring 
the similarity between the three classes based on the analogical model. This can 
be done in different ways. In a model with few classes and lots of errors between 
the classes, we could look at the degree of confusion between any two classes and 
set classes with more confusion as more similar. In models with many classes 
this is less practical because class size is Zipf distributed (Blevins et al. 2016), 
which means that many classes will have very few members. In highly accurate 
models with very few errors, the measured similarity for small classes will be 
much less reliable. An alternative I will use in this situation is to directly use the 
probabilities predicted by the model. 

The probabilities for Model 3 can be seen in Table 4.107. In this table, each line 
shows the probabilities a stem has of belonging to either of the three classes. So, 
for lama, the probabilities are 0.8496 for class A, 0.1503 for class B, and 6.225e-09 
for class C. From these probabilities we can build a (negative) correlation distance 
matrix? and from this, a distance matrix as shown in Table 4.11. 


"For the purposes of this example I am not splitting the dataset into training and testing sets. 
For the actual case studies the probabilities used come from the same cross-validation process. 

*When using errors instead of probabilities the process is the same, but we take the negative 
correlation measures for the confusion matrix instead. 
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Table 4.10: Predicted probabilities for Model 3 


4.5 Clustering and distances between classes 


Probability A Probability B Probability C Word 


N 0 A Ch OP wWwWhd 


ha ua 
Ro 


8.496e-01 
8.496e-01 
8.496e-01 
8.496e-01 
9.512e-01 
5.987e-01 
5.987e-01 
9.512e-01 
9.512e-01 
5.974e-01 
5.974e-01 
1.018e-01 

5.833e-23 
2.353e-17 
2.353e-17 
3.006e-01 
3.006e-01 
2.353e-17 
2.353e-17 
8.127e-17 

2.353e-17 
1.018e-01 

5.987e-01 
9.512e-01 
6.060e-11 
6.060e-11 
6.060e-11 
1.308e-11 

5.052e-12 
1.162e-09 
5.052e-12 
5.052e-12 
6.060e-11 


1.503e-01 
1.503e-01 
1.503e-01 
1.503e-01 
4.873e-02 
4.012e-01 
4.012e-01 
4.873e-02 
4.873e-02 
4.025e-01 
4.025e-01 
8.981e-01 
1.000e+00 
1.000e+00 
1.000e+00 
6.993e-01 
6.993e-01 
1.000e+00 
1.000e+00 
1.000e+00 
1.000e+00 
8.981e-01 
4.012e-01 
4.873e-02 
7.044e-14 
7.044e-14 
7.044e-14 
1.154e-13 
7.686e-15 
1.756e-12 
7.686e-15 
7.686e-15 
7.044e-14 


6.225e-09 
6.225e-09 
6.225e-09 
6.225e-09 
5.017e-12 
3.328e-11 
3.328e-11 
5.017e-12 
5.017e-12 
1.745e-11 
1.745e-11 
3.138e-11 
1.999e-19 
4.455e-15 
4.455e-15 
1.220e-08 
1.220e-08 
4.455e-15 
4.455e-15 
1.107e-17 
4.455e-15 
3.138e-11 
3.328e-11 
5.017e-12 
1.000e+00 
1.000e+00 
1.000e+00 
1.000e+00 
1.000e+00 
1.000e+00 
1.000e+00 
1.000e+00 
1.000e+00 


lama 
lara 
lado 
laso 
tama 
ga 
gal 
tar 
tsar 


tek 
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Table 4.11: Correlation distances for Model 3 


Correlation matrix 


A B C 


A 0.000 -1.359 -1.533 
-1.359 0.000 -1.598 
C -1.533 -1.597 0.000 


w 


As distances 
A B 


B 0.359 
C 0.533 0.598 


From the distance matrix we can see that A is closest (has the smaller distance) 
to B, and that the greater distance is between B and C. Using the distance matrix 
we can then build a dendrogram using hierarchical clustering? (Rokach & Mai- 
mon 2005) as in Figure 4.3. Similarly, we can compress the information given 
in the correlation matrix from three to two dimensions using multidimensional 
scaling (MDS) (Borg & Groenen 2005; Cysouw 2007). Informally, MDS is a way 
of visualizing highly dimensional data in a two-dimensional plot. It tries to pre- 
serve as much of the original distance between two objects as possible. There is 
an inherent data loss when using MDS, which means the plots are an approxi- 
mation, and there is dimensional data in the original distance matrix being lost. 
Using this two-dimensional representation of the data we can plot the categories 
on a two-dimensional plane as in Figure 4.4. 

In this case, both representations agree with the observation from before: A 
and B are closer to each other than to C. Additionally, Figure 4.3 shows that A 
is somewhat closer to C than B is. For simple cases with only three groups I will 
only make use of dendrograms, but for cases with many classes I will also use 
MDS. 


?For the clustering I use the Ward's linkage method. Although Ward's method (Murtagh & 
Legendre 2014) is designed to be applied to Euclidean distances, some recent studies have 
shown it performs formidably with other distance metrics (Meyniel et. al. 2010; Strauss & von 
Maltitz 2017). 
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> 
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0.0- à L A 
1.0 1.5 2.0 2.5 3.0 
x 


Figure 4.3: Dendrogram based on correlation distances for Model 3 


0.4 
A 
0.2 
> 
0.0 
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-0.2 
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Figure 4.4: MDS based on correlation distances for Model 3 
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4.6 Summing up 


I have shown in this chapter that building analogical models with neural net- 
works is not conceptually different from finding analogical relations by hand. 
The statistical models are, to a great extent, a notational variant of informal de- 
scriptions or schemas. They have the advantage that they require less manual 
work and can be easily applied to very large datasets. The clustering analysis with 
dendrograms and the MDS analysis for finding similar classes does not produce 
substantially different results from what a linguist would arrive at by inspecting 
the items manually. As stated before, there is no claim about the cognitive re- 
ality or psycholinguistic plausibility of the neural networks themselves. Neural 
networks are simply tools. The claim is that the analogical relations are present 
in the data, and speakers can thus make use of these relations. 

In the next chapters I will use these tools to explore different analogical sys- 
tems in various languages. Part II contains four chapters besides this one, each 
corresponding to a general topic and containing at least two case studies. Chap- 
ter 5 deals with some general gender issues in Latin and Romanian. This chapter 
introduces the basic claim, and shows how analogical relations that predict gen- 
der in nouns have a correlate with the hierarchy. Chapter 6 shows what happens 
in systems where simple trees are not enough and we need hybrid types in the 
hierarchy. This chapter deals with the topic of overabundance and affix compe- 
tition in Russian and Croatian. Chapter 7 explores the claim that we need struc- 
tural information in the analogical models. I present examples from prefixing 
languages (Swahili and Otomí de la Sierra), where the analogical process takes 
place on the first segments of the items, and Hausa, were the analogical specifica- 
tion requires more structure than for other languages. Chapter 8 presents three 
cases of complex inflectional systems: Spanish verb classes and Kasem number 
classes. This chapter provides the strongest evidence for the interaction between 
type hierarchies and analogical processes. Finally, Chapter 9 sums up the results 
and their implications for both usage-based and formal linguistics. 
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In this chapter, I discuss two cases of gender assignment and inflection class 
interaction: Latin third declension nouns and Romanian nouns. The question of 
gender assignment is an old one, and there are many papers proposing analogical 
models to account for this phenomenon in different languages. Some early work 
on the matter concluded that "there seems to be no practical criterion by which 
the gender of a noun in German, French or Latin [can] be determined" (Bloom- 
field 1933: 280). But since Bloomfield, there has been great progress towards estab- 
lishing the opposite conclusion: "French grammarians have been hasty in their 
conclusion that there are no regularities or only minimal ones to gender deter- 
mination" (Tucker et al. 1968: 316), and "gender can be predicted for a large pro- 
portion of German nouns, and that there is a complex interplay of overlapping 
semantic, morphological and phonological factors" (Corbett 1991: 49). 

Corbett (1991), for example, reports on a series of languages where he notes 
that the shape of nouns is a strong predictor of their inflectional class and gender: 


Declensional type may in turn overlap with phonology; it may be possible 
to predict the declensional type from the phonological shape of the stem. 
Where this is systematically the case, we shall consider it to be phonolog- 
ical assignment; this is the simpler claim, since phonological information 
must in any case be stored in the lexicon (Corbett 1991: 34) 


The most relevant work on gender prediction can be found in C. Matthews 
(2005; 2010, see also Lyster 2006) where the author looked at French! gender 
assignment. Matthews (2010: 879) found that “the results [of the model] show 
that not only does the final syllable prove a reliable indicator but that it is, in 
fact, more reliable than most other sequences" (see Marchal et al. (2007) and 
Seigneuric et al. (2007) for evidence that children use these cues when learning 
French nouns, but compare Boloh & Ibernon (2010)). Similar to French, gender 
assignment in Spanish has received a lot of attention (Morin 2006; Sánchez 1995; 
Smead 2000), including some analogical computational models (Eddington 2002). 


INon-indoeuropean languages have received considerably less attention, exceptions being 
Navajo (Southern Athabaskan) (Eddington & Lachler 2006; McDonough 2013) and Swahili 
(Bantu), discussed in Chapter 7. 


5 Gender systems 


Similarly, for German, there is a vast amount of background on how speakers 
predict the gender of nouns (Hahn & Nakisa 2000; Kópcke & Zubin 1984; Kópcke 
et al. 2010; Salmons 1993; Schwichtenberg & Schiller 2004; Zubin & Kópcke 1986; 
1984). In Kópcke & Zubin (1984), the authors propose a series of schemata for 
predicting the gender of German monosyllabic words, with 90% accuracy. These 
schemata are partly phonological and partly semantic. The authors also found 
several semantic factors underlying the system, like the fact that specific con- 
cepts tend to be feminine or masculine, while more abstract concepts tend to be 
neuter. 

These studies have mostly focused on the properties of the system but others 
have also explored the cognitive underpinnings of gender assignment, and how 
analogical systems are actually responsible for how gender is assigned to new 
nouns (Holmes & Segui 2004; Caffarra et al. 2015; Caffarra & Barber 2015; Taylor 
2012). 

A key point worth emphasizing is the difference between gender and inflection 
class. Gender relates to agreement, inflection class is about the actual markers. 
The need to differentiate between both has been made explicit before (Aronoff 
1994; Harris 1991) but it is not always made explicit. Although there is often cor- 
relation between gender and inflection class in nouns and adjectives, as the ex- 
amples of this chapter will show, this correlation is only partial. 

In the following two sections, I explore two languages which have received 
less attention from an analogical perspective: Latin and Romanian. These two 
showcases were chosen due to the shapes of their systems. In Latin, we have a 
very tree-like hierarchy, which allows us to explore what happens in simple con- 
figurations with few classes. The Romanian gender-inflection class interaction 
offers a more complex case, in which there are multiple proposals regarding the 
correct number of genders found in Romanian, and how they relate to inflection 
class. 


5.1 Masculine-feminine syncretism: Latin 


5.1.1 The Latin third declension 


In the Latin third declension, we find syncretisms between the masculine and 
feminine nouns?. Table 5.1 shows that the masculine noun pater ‘father’ and the 


"The reason for only focusing on third declension nouns is precisely that this is the only declen- 
sion class in Latin where we clearly find all three genders abundantly represented. Focusing 
only on one of the five declension classes also means that we are removing the effects of cross- 
ing trees like in Romanian, Spanish or Kasem. 
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feminine noun vox 'voice' have the same inflectional endings, while the neuter 
noun nomen ‘name’ presents a different set of endings. Some gender assignment 
rules have already been proposed for these nouns. Aronoff (1994) proposes a 
series of regularities but in the end does not pursue a completely formalized 
system. 


Table 5.1: Paradigms for pater ‘father’, vox ‘voice’ and nomen ‘name’ 


masculine feminine neuter 
singular plural singular plural singular plural 
nom. pater patr-es vox voc-és nomen  nómin-a 
acc. patr-em  patr-es voc-em  vOc-€s nóm-en  nómin-a 
gen.  patr-is  patr-um  voc-is vóc-um  nóm-inis nómin-um 
dat.  patr-1 patr-ibus  vóc-1 voc-ibus nóm-ini  nómin-ibus 
abl.  patr-e patr-ibus vóc-e voc-ibus nóm-ine  nómin-ibus 


This same third declension syncretism is also found in adjectives. Take for 
example the paradigm for vetus ‘old’ in Table 5.2. Again, masculine and feminine 
classes take the same endings. 


Table 5.2: Paradigm for vetus, veteris ‘old’ 


masculine/feminine neuter 
Case singular plural singular plural 
nom. vetus veter-es vetus veter-a 
acc. veter-em  veter-es vetus veter-a 
gen. veteris ` veterum veteris  veter-um 
dat. veter-i veteribus  veter-i veter-ibus 
abl. veter-e veteribus  veter-e  veter-ibus 
voc. vetus veter-es vetus veter-a 
loc. veter-i veteribus veteri veter-ibus 


From a declension class perspective, this system is fairly simple?. The hierar- 
chy in Figure 5.1 basically says that feminine and masculine form a class, which 
easily captures the syncretism in that one inflectional construction will apply to 
neuters and one to non-neuters for the third declension. 


"lt is simple because it only considers the third declension. The complete nominal declension 
system is much more complex. 
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A S 


nom-infl gender 


a NU Sm 


second-infl  third-infl ` fourth-infl ^ neuter feminine ^ masculine 


neut-infl | non-neut-infl 


Figure 5.1: Latin noun inflection class hierarchy 


One set of constructions (or rules, etc.) would apply to the neuter type, while a 
different set would apply to the non-neuter-infl type, thus producing the observed 
syncretisms. The expectation would then be that masculine and feminine lexemes 
in the third declension should be more similar to each other than to neuter nouns. 


5.1.2 Data 


I extracted all third declension nouns from the digital Latin dictionary Words 
by Whitaker (2019). The totals by gender (after removing nouns marked with 
common gender, e.g. celestis ‘divinity’) are: 7773 feminine, 2993 masculine, 1499 
neuter. We can see that there are many more feminine nouns than neuter or 
masculine nouns.* 

As the basis for the analogy, I used the stems provided in the dictionary. This 
is likely to introduce an accuracy bias into the model, as it does not filter deriva- 
tional morphology. Some Latin suffixes are gender assigning: third declension 
nouns ending in -tor are mostly masculine, with around four exceptions: cari- 
tor ‘wool-carders’ (feminine), litor "beach, landing place’ (neuter), pector ‘breast, 
heart’ (neuter). It is clear that these cases do not really contain a derivational 
-tor suffix but rather happen to end in a tor sequence. Similar cases for feminine 
nouns are gender assigning suffixes like -tat and -tas: absurditas ‘absurdity’. 

This particularity of the dataset, however, should not really represent a prob- 
lem for the question at hand. It is true that the model will confound some mor- 
phological with phonological analogies, but these effects should have no impact 


“Note that we would expect such a disproportion to favour a model that grouped feminine 
nouns against neuter and masculine nouns. The reason for this being that when a model cannot 
reliably predict the class of some item, it tends to assign it to the most frequently observed class 
(since this is the most likely outcome). In other words, in a no information situation, it is more 
likely that a noun will be feminine than masculine or neuter. 
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either way on the similarity clustering over the three classes. If anything, the ad- 
ditional morphological information would reduce confusion between masculine 
and feminine classes. Nonetheless, I present models on two datasets, one which 
includes all derived nouns and a reduced dataset excluding clear cases of gender- 
assigning suffixes. The number of nouns by gender in the reduced dataset is: 6626 
feminine, 2153 masculine and 1496 neuter. 


5.1.3 Results 


We fit an analogical model to the Latin data using the formula: gender - 
final.1 + final.2 + final.3 + num vowels. This model looks at the three 
last segments of the stem and the number of vowels. The results can be seen in 
Table 5.3 and the corresponding statistics in Table 5.4. 


Table 5.3: Confusion Matrix for the model predicting gender of Latin 
third declension nouns 


Reference 


Prediction Feminine Masculine Neuter 


Feminine 7244 569 77 
Masculine 432 2236 196 


Neuter 97 188 1226 


Table 5.4: Overall statistics for Confusion Matrix Table 5.3 


Overall statistics: 


Accuracy : 0.8729 
95% CI : (0.8669, 0.8787) 
No Information Rate : 0.6338 
Kappa : 0.7557 


Statistics by Class: 


Feminine Masculine Neuter 


Sensitivity 0.9319 0.7471 . 0.8178 
Specificity 0.8562 0.9323 . 0.9735 
Balanced Accuracy 0.8941 0.8397 ` 0.8957 
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Table 5.5: Confusion Matrix for the model predicting gender of Latin 
third declension nouns 


Reference 


Prediction Feminine Masculine Neuter 


Feminine 6114 577 70 
Masculine 420 1391 182 


Neuter 92 185 1244 


The equivalent model for the reduced dataset can be seen in Table 5.5 and the 
corresponding statistics in Table 5.6. For both datasets, the results are almost 
identical. As expected, the smaller dataset produces slightly worse results, be- 
cause the nouns removed were amongst the easily predicted ones”. A clustering 
analysis of both models can be seen side-by-side in Table 5.2. 


Table 5.6: Overall statistics for Confusion Matrix Table 5.5 


Overall statistics: 


Accuracy : 0.8515 
95% CI : (0.8445, 0.8583) 
No Information Rate : 0.6449 
Kappa : 0.7108 


Statistics by Class: 


Feminine Masculine Neuter 


Sensitivity 0.9227 0.6461 0.8316 
Specificity 0.8227 0.9259 0.9684 
Balanced Accuracy 0.8727 0.7860 0.9000 


Table 5.2 shows that the feminine and masculine nouns are closer to each other 
than they are to neuter nouns. This confirms the expectations of the Arc model 
and matches the inflectional system where we find syncretism between the mas- 
culine and feminine. 

In conclusion, I have shown that in a very simple system like the one of Latin 
third declension nouns, the analogical model makes exactly the right predictions 


“Because the derivational suffixes are identified by the model as sequences that reliably predict 
gender. 
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o o 
o o 
x x 
e o 
e e 
e e 
o o 
EE E 2 Es e £ 
E 3 E $ 
E E 3 E E El 
- > - > 
(a) Clustering analysis on Table 5.4. (b) Clustering analysis on Table 5.6. 


Figure 5.2: Clustering analysis for Latin gender assignment 


about how the three genders should cluster together based on formal properties 
of the stems. We see the same result for both datasets, with and without gender 
assigning suffixes. 


5.2 Gender vs inflection class: Romanian 


5.2.1 The Romanian gender and plural system 


A much more interesting gender and number system can be found in Romanian. 
Like Latin, Romanian is often analyzed as having three genders, which it inher- 
ited from Latin (Gónczól 2007: 23). The interesting aspect of Romanian gender 
is that the neuter does not have a dedicated marker, but patterns with the mas- 
culine in the singular and with the feminine in the plural. As Cojocaru explains, 
this phenomenon can be observed on all elements that agree with a noun. 


The distinctive part of the neuter gender in Romanian is that it does not 
have any formal particularities. The neuter nouns in the singular look like 
masculine nouns, while in the plural they look like feminine nouns. The 
same applies to adjectives, pronouns and pronominal adjectives. When 
they modify or replace a neuter noun in the singular they appear in their 
masculine singular form, and when they modify or substitute a neuter 
noun in the plural they appear in their feminine plural form. (Cojocaru 
2003: 27) 


One striking example of this situation is illustrated by the three inflection 
classes in Table 5.7, each of which is found in only one gender. In this part of 
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the system, neuter nouns inflect like masculine nouns in the singular, and like 
feminine nouns in the plural. This means that, while there are no specific markers 
for neuter nouns, there is a three way split in the system. 


Table 5.7: Three way gender system in Romanian 


singular plural gloss 


masculine condr-u  condri forest 
neuter muze-u  muze-e museum 
feminine ` cas-á cas-e house 


In terms of agreement, we see the same phenomenon (adapted from Farkas 
1990), as can be observed in examples (1)-(3): 


(1 masculine 


a. Un trandafir alb e scump. 
a.MASC.SG rose white.MASC.SG is expensive.MASC.SG 


‘A white rose is expensive. 


b. Unii.MAsc.PL trandafiri alb-i sunt scump-i. 
some rose white-MAsC.PLis ^ expensive-MASC.PL 


€ a * > 
Some white roses are expensive. 


(2) feminine 
a. O garoafa alb-a e scump-a. 
a.FEM.SG carnation white-FEM.sG is expensive-FEM.SG 
‘A white carnation is expensive’ 


b. Unele garoafe alb-e sunt scump-e. 
some.FEM.PL carnation white-FEM.PL is ^ expensive-FEM.PL 


€ . . A , 
Some white carnation are expensive. 
(3) neuter 


a. Un scaun alb e scump. 
a.MASC.SG chair white.MAsC.sG is expensive.MASC.SG 


'A white chair is expensive: 


b. Unele scaune alb-e sunt scump-e. 
some.FEM.PL chairs white-FEM.PL are expensive-FEM.PL 


‘Some white chairs are expensive: 
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Here we have the identical type of distribution for agreement, as we saw for 
markers in Table 5.7. The word alb ‘white’ has the same marker (namely -ø) when 
modifying a masculine or neuter noun in the singular, and it has the same marker 
(namely -e) when modifying a neuter or feminine noun in the plural. So, even 
though there are only two different agreement markers in the plural and in the 
singular$, the alignment pattern produces three genders. Additionally, Romanian 
has a relatively complex inflection class system for singular and plural. Table 5.8 
presents the basic classes (Cojocaru 2003)". Usually, the singular is taken to be a 
sort of simplex form, instead of being composed of a stem and a singular marker. 
Itake a slightly different approach here and consider the singular to be composed 
of a stem and a singular marker. 

Table 5.8 shows the problematic issue in the interaction between Romanian 
gender and number markers. Although gender correlates with inflection class, 
knowing the gender of a noun in Romanian is not enough for knowing its plu- 
ral (or singular) form. Based on this fact, it has been argued that Romanian does 
not have three genders but rather a complex interaction between number mark- 
ers. The most recent of these accounts is offered by Bateman & Polinsky (2010). 
The authors, partially following previous proposals by Hall (1965) and Farkas & 
Zec (1995), claim that there are only two genders in Romanian, masculine and 
feminine, and that it is not gender that determines plural formation, but plural 
formation that helps determine gender in Romanian: 


Our position is supported by the fact that in traditional three-gender anal- 
yses there is limited predictability of plural endings for nouns in the same 
class, clearly showing that gender specification alone does not predict plu- 
ral form (Bateman & Polinsky 2010: 53) 


Similarly, they claim that a two-gender system for Romanian is more parsimo- 
nious than a three gender system because "the same factors relevant for plural 
formation are indirectly relevant for predicting gender assignment and agree- 
ment in the plural" (Bateman & Polinsky 2010: 45). 

To address the problem of agreement in Romanian nouns, Bateman & Polinsky 
(2010: 52) propose that "Romanian has two noun classes in the singular and in 
the plural" but clarify that "this categorization is not lexically specified". These 
classes are in turn different for plural and singular. The first aspect of their system 
is easy to model with a type system, but the clarification that said noun classes 
are not lexically specified, less so. For this model, all lexemes in Romanian would 


ĉIn (1) one could argue either that the consonant is the marker, or that there is a -Ø marker. 
"Classes iu-ie and u-á in the neuter are classified as exceptions by the author. 
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Table 5.8: Number inflection classes in Romanian 


Sg. marker Pl. marker Singular Plural Gloss 


Masculine nouns 


-C -C+i elev elevi ‘school student’ 
-u -i leu lei ‘lion’ 

-e -i iepure iepuri ‘rabbit’ 

-i -i ochi ochi ‘eye’ 

-á -i tata tati ‘father’ 


Feminine nouns 


-á -e casá case ‘house’ 

-á -i usă usi ‘door’ 

-ă -uri marfă mărfuri ‘merchandise’ 
-e -i lume lumi ‘world’ 

-V+ie -V+i baie băi ‘bathroom’ 
-C+ie -C+ii frectie frectii “massage” 

-a -ale basma basmale ‘handkerchief’ 
-ea -ele cafea cafele “coffee” 

-i -i marți marți ‘Tuesday’ 


Neuter nouns 


-C -C+uri tren trenuri ‘train’ 

-C -C+e capac capace ‘lid’ 

-u -uri lucru lucruri ‘thing’ 

-u -e muzeu muzee ‘museum’ 
-u -ă ou oud “egg 

-iu /ju/ -ii /ij/ exercițiu exercitii ‘exercise’ 
-iu /iw/ -ie /i.e/ sicriu sicrie ‘coffin’ 

-i /j/ -ie /je/ tramvai tramvaie ‘tram’ 

-i /i/ -iuri taxi taxiuri ‘taxi’ 

-e -e nume nume ‘name’ 
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be underspecified for a lexical class feature, which would only be specified after 
an inflectional process derives the singular or plural form of the noun. 

For the singular, Bateman & Polinsky (2010) propose classes A and B. Tra- 
ditionally feminine nouns belong to class A, while traditionally masculine and 
neuter nouns belong to class B. Class assignment in the singular is driven by 
both formal and semantic features (since animate nouns can straightforwardly 
be categorized as feminine or masculine, as well as other minor semantic classes). 
For the plural, the authors propose classes C and D. Class D includes traditional 
masculine nouns, while class C includes all other nouns. Again, class member- 
ship is determined by semantic and formal cues, where the formal cues are the 
plural endings of the nouns. A graphic representation of the two gender model 
is shown in Table 5.3 


A feminine 


"m Ee 


Singular neuter Plural 


Z 
masculine 


D 


Figure 5.3: Two gender model for Romanian 


In the model by Bateman & Polinsky (2010), it is the plural class that deter- 
mines gender: 


In fact, with the exception of traditional masculines, all of which take the 
plural marker -i, there are very few feminine and neuter nouns for which 
gender classification alone can predict plural form. For example, feminine 
nouns ending in -e take the -i plural marker [...]. As we mentioned previ- 
ously, there are also feminine nouns ending in stressed -a or -ea that take 
the -le plural marker, and there are neuter nouns ending in a stressed -1 and 
borrowings from French ending in -ow that take -uri in the plural. Notice 
that in each of these cases the plural ending is determined by the noun's 
ending rather than its gender class, which supports our claim that the plu- 
ral forms determine class membership in the plural, rather than the other 
way around (Bateman & Polinsky 2010: 54) 


This approach has recently received some support from a computational model. 


Dinu et al. (2012) present systems based on two support vector machines, one 
trained on plurals and one trained on singulars, which manages to distinguish 
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neuter nouns very well, at around 99% accuracy (for two previous computational 
approaches see Cucerzan & Yarowsky 2003 and Nastase & Popescu 2009). Dinu 
et al. (2012: 123) mention that their model supports Bateman and Polinsky's (2010) 
model, as plural class is in fact distinguishable for nouns from purely formal cues, 
and that gender is not needed. It is not completely clear from the study by Dinu 
et al. (2012) study, however, that gender would not provide extra information 
about plural formation. First, the authors looked at singular and plural words in 
the nominative, which means that their model had number information which is 
highly correlated with gender. A second issue is that the authors only considered 
the effect of formal cues for predicting gender, but did not fit a model that took 
into account the effect of gender. 

Other solutions for modelling gender in Romanian have been proposed in dif- 
ferent linguistic theories. Probably the most well known is Farkas & Zec (1995), 
who take an underspecification approach, where feminine nouns are specified as 
*FEM, masculine nouns as -FEM, and neuter nouns are not specified for gender 
(see also Farkas 1990, as well as Sadler 2006; Wechsler 2008; Kramer 2015). These 
approaches assume the existence of three genders, but diverge in how exactly 
their interrelations are implemented. 

A different approach is pursued by Steriade (2008). Steriade identifies some 
phonological constraints on the plural choice of some nouns. Her approach, how- 
ever, focuses on the different phonological processes that stems undergo with 
certain markers, rather than on the actual choice of different markers. I will ig- 
nore stem processes in this study, but the approach developed in Chapter 8 for 
Spanish could be extended to the Romanian system. 


5.2.2 Modelling the system 


The two gender model as presented by Bateman & Polinsky (2010) has a con- 
ceptual problem. There are three types of nouns in Romanian based on their 
agreement behaviour. The discussion of how this originates and what features 
are responsible for this phenomenon is somewhat of a red herring. The fact is 
that we have three agreement patterns, and whether we need a lexically speci- 
fied feature for this is a different question. Additionally, the argument that we 
do not need gender because inflection class is predictable from formal features, 
and because gender does not completely determine inflectional class, is not very 
convincing. 

It is not really surprising that declension class is partially independent of gen- 
der, since this is not all that rare typologically speaking (Corbett (1991), and Chap- 
ter 8), and it is even present in other Romance languages. A simple example is 
Spanish, where exponents of the singular only partially correlate with gender 
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(Harris 1991). Similarly, the fact that gender does not completely determine in- 
flection class does not entail that gender has no information about the inflection 
class of a noun, independently of other formal features on the noun. Incomplete 
information does not mean no information. 

As for the Romanian system, what does make sense in the two-gender pro- 
posed by Bateman & Polinsky (2010), is to have four agreement classes, two for 
singular and two for plural, and many different actual inflection classes according 
to the singular-plural marker combination. This idea is depicted in the hierarchy 
Table 5.4. 


agreement 


F N M 


Figure 5.4: Romanian Gender-Number hierarchy 


This hierarchy is exclusively about agreement because it indicates what the 
agreement would be with a given adjective. Notice that only listing the plural 
and singular for each noun is insufficient because adjectives do not agree with 
nouns in terms of markers, but in terms of gender as shown in (4). 


(4) a. tren-uri mic-i 
train.MASC.PL small.MASC.PL 
‘small trains’ 
b. lum-e mic- 
world.FEM.PL small.FEM.PL 


‘small world’ 


c. lucr-u mic 
thing.NEUT.PL small.NEUT.PL 


‘small thing’ 
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The adjective mic ‘small’ has three forms: mic (masculine and neuter singular), 
micá (feminine singular) and mici (plural). This adjective does not agree with the 
number markers on the nouns, but with their gender. 

Because, as we saw, gender does not completely determine inflection class, 
this dimension needs to be modelled separately. For each gender, there are some 
markers, either singular or plural, which are unique to said gender. So, for exam- 
ple, the plural maker -iuri is only found with neuter nouns, the singular marker 
-C+ie only occurs with feminine nouns, and the plural marker -C+i is only found 
with masculine nouns. This is crucial because we cannot claim that masculine 
and neuter nouns share all singular markers, or that feminine and neuter nouns 
share all plural markers. Markers like -ă are found in the singular with feminine 
and masculine nouns, and in the plural with neuter nouns. Except for the classes 
where both the singular and plural use the same marker, markers are uniquely 
determined by the number they express. That is, even though the marker -á can 
express singular or plural, knowing the gender of the noun immediately resolves 
the uncertainty. In the case of -e in feminine nouns, we need to know the other 
marker of the noun to be able to tell whether -e is a plural or singular marker. 

The issue becomes even clearer when we look at how the different number 
classes distribute across genders in Table 5.9 (see next section for an explanation 
ofthe dataset). What we clearly see here is that, with the exception ofthe classes? 
á-i, e-e, e-uri and i-i, declension class determines gender. We also see that the 
confusion is with the feminine, i.e. the masculine and neuter classes are never 
confused. Notice that this has the reverse structure of the agreement pattern, 
where neuter patterns with masculine and feminine, but these two do not pattern 
together. 

There are some additional classes not listed in Cojocaru (2003). For example, 
nutria ‘otter’ forms its plural as nutrii. Similarly, anaconda ‘anaconda’ forms its 
plural as anaconde. I leave these classes in as they are, but recognize that they 
might be special cases of foreign words or particular exceptions.” 

The distribution of singular and plural markers can be seen in Table 5.10 and 
Table 5.11. In these distributions, we find something similar to what we had in the 
distribution of classes. Although there are markers that are shared by all three 
genders, namely -i and -e in the singular, there are no markers that are only 
shared by the neuter and feminine in the singular, and the only marker shared 


$T use the notation sg-pl. 

?The markers -i and -Vi for the plural feminine could be collapsed into a single -i marker. For 
consistency with Cojocaru (2003), I keep them as distinct markers, but in the end this decision 
should not really make much of a difference. 
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Table 5.9: Number classes by gender in Romanian 


feminine masculine neuter 


a-ale 172 0 0 
a-e 178 0 0 
á-e 11647 0 0 
a-i 56 0 0 
á-i 2855 51 0 
á-uri 1590 0 0 
C-e 0 0 7746 
C-i 0 7252 0 
C-ii 0 0 25 
C-uri 0 0 5586 
ea-ale 1 0 0 
ea-ele 384 0 0 
e-e 807 0 155 
e-i 13814 227 0 
e-iuri 0 0 3 
e-uri 17 0 90 
i-e 0 0 31 
le-e 112 0 0 
le-ii 6771 0 0 
ie-Vi 171 0 0 
i-i 75 567 0 
i-ie 0 0 189 
i-iuri 0 0 237 
ju-ie 0 0 19 
ju-ii 0 0 348 
u-á 0 0 1 
u-e 0 0 936 
u-i 0 700 0 
u-uri 0 0 456 
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by the masculine and feminine is -á, with a suspiciously low type frequency in 
the masculine. On the other hand, in the plural, except for -i, sharing of markers 
is only found between neuter and feminine. 

With these facts in mind, there are three alternatives for a hierarchy of number 
markers in Romanian. If we wanted to keep the symmetry in the hierarchy be- 
tween plural and singular, we could separate the markers that cross ‘the wrong’ 
classes into two. There are two potential justifications for this move, one theoret- 
ical and one empirical. Thinking in terms of simplicity, adding three additional 
singular markers, and one additional plural marker reduces the complexity of 
the hierarchy. The second reason has to do with the relative type frequencies 
of the problematic markers. If we look at their distributions, in the singular, -á 
and -e, as shown in Table 5.10, are much more common with the feminine than 
with the masculine or neuter. In a similar way, -i has more or less the same type 
frequency for the neuter and masculine, and it is less frequent with the feminine. 
In the plural, -i is much more frequent with the feminine than the masculine. 


Table 5.10: Singular classes by gender in Romanian 


Feminine Masculine Neuter 


a 406 0 0 


a 16092 51 0 
C 0 7252 13357 
e 14638 227 248 
ea 385 0 0 
i 75 567 457 
ie 7054 0 0 
iu 0 0 367 


u 0 700 1393 


Pursuing a symmetric approach, the system would have -e y, Ze, är and gun 
-imn and -á,, in the singular; and -i,, and -i,¢ in the plural. In Figure 5.5 we see 
what a hierarchy under these assumptions would look like. 

An alternative is to have an asymmetric hierarchy, but fewer individual mark- 
ers. A sketch of this hierarchy can be seen in Table 5.6. In this case, there is no 
real symmetry between the singular and the plural, nor is there any with the 
agreement patterns. 

The final inflection classes result from specifying pairings between the singu- 
lar and plural markers shown in Table 5.8. Since there is no free combination 
between singular and plural markers, each class must be specified directly. 
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Table 5.11: Plural classes by gender in Romanian 


á 0 0 1 


Feminine Masculine Neuter 


ale 173 0 0 
e 12744 0 8868 
ele 384 0 0 
i 16800 8797 0 
ie 0 0 208 
ii 6771 0 373 
iuri 0 0 240 
uri 1607 0 6132 
Vi 171 0 0 


number markers 
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Figure 5.5: Romanian marker hierarchy 
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Figure 5.6: Symmetric Romanian marker hierarchy 


The third alternative consists of two independent flat lists for singular and 
plural markers and then specify each inflection class as in Table 5.6. A simpli- 
fied hiearchy for this approach is given in Table 5.7. The advantage of this model 
is that it is simpler than the previous two, in that it requires less complex in- 
teractions between types. The downside is that the partial correlations between 
gender and inflection class would be lost. 


number markers 


SG PL 


-ă si -e -iu -a d -á -e -Vi 


Figure 5.7: Simplified Romanian marker hierarchy 


There is no way of deciding apriori which of these three approaches is better. 
The choice between the three will depend on considerations pertaining to what 
a theory of morphology should look like. They do, however, make slightly dif- 
ferent predictions in terms of what we should find in the analogical model. In 
the first hierarchy, we would expect there to be little to no confusion between 
feminine, masculine and neuter nouns, and there should be a separation between 
the classes with the -e, -ă and -i markers in the singular, as well as those with 
the -i markers in the plural. That is, these dimensions should not be available for 
the analogical model. 

The hierarchy in Table 5.6 predicts that those three markers should be available 
for nouns to cluster together, and we should thus see classes clustering around 
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these markers. Similarly, these classes should allow for some limited confusion 
between masculine and feminine, especially in the classes with the shared mark- 
ers. 

Finally, the hierarchy in Table 5.7 predicts that clustering should be exclusively 
about markers and not around gender. Therefore, we would expect classes with 
shared markers to cluster together, but not classes forming clusters around the 
genders they correlate with. The hierarchies in Table 5.5 and Table 5.6 do predict 
some clustering around genders and some clustering around markers. 


5.2.3 Data 


For this study, I used the Romanian dictionary DEX Online (https://dexonline.ro), 
taking the data base from the python api (Náválici 2013). From the dictionary", I 
extracted all nouns in the nominative form for which a plural form was specified. 
From these, I removed all nouns with a plural form ending in s because these are 
clear borrowings from Spanish and other languages. Finally, I removed all nouns 
with common gender. This process gives us 63646 nouns. For each noun, I ex- 
tracted the plural and singular markers according to the description in Cojocaru 
(2003) and added the extra classes not listed there. 

The distribution of nouns by gender in the extracted corpus is, for a total 
of 63501 nouns: 38737 feminine nouns, 8891 masculine nouns and 15873 neuter 
nouns. 


5.2.3.1 Methodology and hypothesis 


There are basically two claims at stake. On the one hand, it has been argued that 
gender information is not helpful when figuring out the plural form of nouns 
in Romanian. On the other hand, we want to test which of the three inflection 
class configurations makes more accurate predictions regarding the analogical 
relations between the stems of the nouns. 

To test the first claim, we can fit two different analogical models: (i) one that 
only looks at phonological information, which would approximate proposals for 
gender assignment based on the ending of nouns in Romanian like those of Vra- 
bie (1989) and Vrabie (2000); (ii) and then a similar model that also looks at gen- 
der information. If gender carries no useful information about the plural form 
of nouns in Romanian, as Bateman & Polinsky (2010) claim, then we should see 


More specifically, from the dex_lexemes, dex_lexems_inflections and dex_inflections data 
bases provided. The search targeted entries with the fields: plural, nearticulat, Substantive and 
Nominative. 
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no difference in the performance of each of the models. If adding gender clearly 
increases accuracy, we can say that there is a high probability that gender does 
in fact play a role in predicting inflection class”. 

To test the second claim, we can look at the overall gender+inflection class dis- 
tribution. For this second part, I used a reduced and more balanced dataset. For 
each noun, I extracted its class as a tuple: gender- singular plural. This produces 
a total of 57 classes, 17 feminines, 14 masculines and 26 neuters. The distribution 
of classes by type frequency can be seen in Table 5.8. From these, I removed the 
three lowest frequency classes (marked in red in Table 5.8), and took a random 
sample of up to 3000 nouns for the more frequent classes. This produces a some- 
what more balanced dataset. 
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Figure 5.8: Class frequency 


The basic prediction is that neuter classes should be confused with feminine 
and masculine classes, but these two should not be confused with each other. 


"We cannot have complete certainty because it is always possible that a different model solely 
based on formal cues could outperform the model including gender. 
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5.2.4 Results 
5.2.4.1 Predicting gender 


First, we predict gender from the shape of the stems with the formula: gender 
~ final.l + final.2 + final.3 + n vowels." Here we are simply looking at 
the final three segments and the number of vowels in the stem. The results can 
be seen in Table 5.12 and Table 5.13. The tables show that gender is in fact pre- 
dictable without any information about number markers. However, in Table 5.12 
we do see that there is a relatively large confusion between masculine and neuter, 
and between neuter and feminine, but not so much between feminine and mas- 
culine. This is again confirmed in Table 5.14 (larger numbers mean more distance 
between the classes). This observation matches the hierarchy in Table 5.4, where 
feminine and masculine genders do not share any set of common nodes, but mas- 
culine and neuter, and feminine and neuter do. 


Table 5.12: Confusion Matrix for the model predicting Gender of Roma- 
nian nouns 


Reference 


Prediction Feminine Masculine Neuter 


Feminine 14314 783 1750 
Masculine 80 665 494 
Neuter 987 2926 6246 


5.2.4.2 Predicting singular 


Next, we turn to the number markers. In this case, we have several dimensions 
that need to be predicted. On the one hand, there are individual number markers, 
and on the other hand there are complete inflection classes with and without 
gender distinctions. Because there are some inflection classes which can appear 
with two genders, it is interesting to ask how well we can distinguish these cases. 
Additionally, because we are mostly interested in seeing how the clusters work, 
we can compare whether predicting inflection class without gender produces 
similar clusters to those we get when predicting inflection class with gender. 
We start with the singular markers with the formula: singular ~ final.1 


“There were no hidden nodes and a decay rate of 0. 
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+ final.2 + final.3 + n vowels)’. The results are given in Table 5.15 and Ta- 
ble 5.16. We see that the singular marker, as defined here, is relatively predictable, 
but not perfectly. 


5.2.4.3 Predicting plural 


We now turn to plural markers. The model used is the same as for singular mark- 
ers, with the formula: plural ~ final.1 + final.2 + final.3 + n vowels. 
The results for predicting plural markers are shown in Table 5.17 and Table 5.18. 
What we find is that the model can predict plural markers somewhat less accu- 
rately than singular markers. Nevertheless, the accuracy and kappa scores are 
quite far above random chance. 

Now we address the claims by Bateman & Polinsky (2010) that gender does 
not really help to determine the plural marker a noun will take, and that plural 
class assignment is solely based on phonological features (including the singular 
marker). Properly testing this claim is not possible because the authors do not 
provide a full model for plural assignment. However, one can compare a model 
that only includes phonological features (and the singular marker) to one which 
also includes gender. We fit a model with the formula: plural ~ final.1 + 
final.2 + final.3 + n vowels + singular + gender. The results can be seen 
in Table 5.19 below. 

Predicting the plural marker with all predictors (gender and singular marker) 
gives us the results presented in Table 5.19, and the corresponding statistics in 
Table 5.20. The model evaluation is given in Table 5.9. Table 5.9 shows that re- 
moving gender from the model causes a very steep drop in accuracy, i.e. gender 
does help in the analogical model. These results clearly speak against Bateman & 
Polinsky’ (2010) claim that gender does not help distinguish plural classes. If this 
result is correct, then there are no strong arguments for a two gender system for 
Romanian. 


5.2.4.4 Inflection class 


Finally, we turn to the prediction of inflection class. Again, there are two possibil- 
ities we want to look at. First, we predict inflection class without making gender 
distinctions, i.e. if class e-e is found in feminine and neuter, we assume this is a 
single class and not two different classes. We use the formula as before: class 


®The model had no hidden nodes and a decay rate of 0. 
“Of course, there is always the possibility that a better model, solely based on phonological 
features, would outperform the model presented here. 
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Table 5.13: Overall statistics for the Confusion Matrix in Table 5.12 


Overall statistics: 


Accuracy : 0.7515 
95% CI : (0.7464, 0.7565) 
No Information Rate : 0.5446 
Kappa : 0.5564 


Statistics by Class: 


Feminine Masculine Neuter 


Sensitivity 0.9306 0.1520 | 0.7357 
Specificity 0.8031 0.9759 ` 0.8019 
Neg Pred Value 0.9064 0.8626 0.8759 
Balanced Accuracy ` 0.8669 0.5639 0.768 


Table 5.14: Distance matrix for the Confusion Matrix in Table 5.12 


Feminine Masculine 


Masculine ` 2.346527 
Neuter 2.256934 0.154508 


Table 5.15: Confusion Matrix for the model predicting the singular 
marker of Romanian nouns 


Reference 
Prediction a a C e ea i ie iu u 
a 15 17 15 3 5 2 7 0 2 
a 156 6181 361 813 205 315 677 156 37 
C 117 232 8104 159 37 530 441 24 648 
e 51 604 218 2993 25 63 186 68 25 
ea 10 1 1 6 3 17 4 0 
1 2 25 20 12 13 80 37 3 2 
le 45 379 181 72 76 87 1784 94 5 
lu 1 9 0 2 if 1 6 12 3 
u 19 54 126 16 10 19 128 6 1375 
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Table 5.16: Overall statistics for the Confusion Matrix in Table 5.15 


Overall statistics: 


Accuracy : 0.7276 
95% CI : (0.7223, 0.7327) 
No Information Rate : 0.3196 
Kappa : 0.6425 


Table 5.17: Confusion Matrix for the model predicting the plural marker 


of Romanian nouns 


Reference 
Prediction ale e ele i ie ii iuri ur Vi 
ale 17 12 5 13 0 7 1 21 0 
e 36 4570 106 1876 57 780 35 1053 56 
ele 3 6 8 10 1 15 4 2 0 
1 86 2415 121 7392 85 471 102 1219 61 
ie 0 6 0 6 6 0 0 10 0 
ll 25 530 85 187 3 1961 30 150 0 
iuri 0 6 4 18 23 21 3 0 
uri 5 665 55 796 55 117 43 2689 38 
Vi 0 9 0 22 1 0 1 12 16 


Table 5.18: Overall statistics for the confusion matrix in Table 5.17 


Overall statistics: 


Accuracy : 0.5905 
95% CI : (0.5848, 0.5963) 
No Information Rate : 0.3654 
Kappa : 0.4278 
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Table 5.19: Confusion Matrix for the model predicting the plural marker 
of Rumanian nouns with additional gender information 


Reference 
Prediction ale e ele i je ii iuri uri Vi 
ale 126 51 1 4 0 0 0 0 0 
e 41 6308 0 473 0 21 16 1334 25 
ele 0 0 383 0 0 0 0 0 0 
1 5 712 0 9807 0 0 0 30 0 
ie 0 0 0 0 170 9 42 0 0 
ii 0 3 0 0 11 3333 0 5 4 
iuri 0 13 0 0 28 1 179 0 0 
uri 0 1093 0 36 0 9 0 3790 0 
Vi 0 39 0 0 0 1 0 0 142 


Table 5.20: Overall statistics for the confusion matrix in Table 5.19 


Overall statistics: 


Accuracy : 0.8581 
95% CI : (0.854, 0.8622) 
No Information Rate : 0.3654 
Kappa : 0.8063 


~ final.1 + final.2 + final.3 + n vowels. The results of this model can be 
seen in Table 5.10 and the corresponding statistics in Table 5.21. 


Table 5.21: Overall statistics for the heat map in Table 5.10 


Overall statistics: 


Accuracy : 0.5577 
95% CI : (0.5518, 0.5635) 
No Information Rate : 0.1062 
Kappa : 0.5121 


Table 5.21 shows that the model performs worse than the model predicting 
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Figure 5.9: Additive (left) and subtractive (right) accuracy and kappa 
scores for the model predicting plural in Romanian 


singular, but better (according to kappa score) than the model predicting only 
plurals. From the heat map in Table 5.10 it is clear that there is a high degree of 
confusion between the different classes, but it also looks like this confusion is 
not entirely random. The two strongest clusters of confusion are between classes 
with a -C singular marker, and between classes with a -u singular marker. If 
we perform cluster analysis on the corresponding similarity model, we get the 
results in Table 5.11. In this figure, I have additionally indicated the gender infor- 
mation for the inflection class for convenience, but the model itself had no in- 
formation about gender. What can be seen from the clustering is that, although 
there is organization along marker lines, the strongest clustering effect is that of 
gender. Additionally, whenever masculine classes cluster together with neuter 
classes, these share the same singular marker, and masculine only classes seem 
to only cluster with neuter classes. 

Next, inflection classes are divided by gender, so that the five classes in the 
dataset which are ambiguous for gender are split into individual classes (one for 
each gender). For this model, the results are presented in Table 5.12 and the cor- 
responding statistics in Table 5.22. There is practically no difference in accuracy 
between both models. The clustering for this model is shown in Table 5.13. This 
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Figure 5.10: Heat map for the model predicting inflection class in Ro- 
manian 


clustering reflects almost exactly the hierarchy in Table 5.6 on page 90. Most clus- 
ters are found within a single same gender exclusively (clusters in light brown, 
light yellow, blue and dark gray), feminine and neuter, or masculine and neuter. 
Particularly clear are clusters where neuter and masculine share the same sin- 
gular marker (clusters in pink, dark brown and light grey), or the feminine and 
neuter share the same plural marker (the cluster in green). The only cluster in- 
cluding the three genders has the classes with singular markers -e and -ă. Marker 
-e is the only one connected to all three genders in Table 5.6, while -á is the only 
marker shared by feminine and masculine genders. 

Romanian plural markers are strongly predictable from phonological features 
of nouns. The models presented here are a strong computational validation of 
Vrabie (1989) and Vrabie (2000). The model by Bateman & Polinsky (2010) is also 
partially supported in the sense that we see strong evidence for four agreement 
classes. But the model presented in this section also refutes Bateman & Polinsky 
in that there is evidence for a gender-number interaction. More precisely, there 
is evidence that inflection classes are partially dependent on gender, and that 
gender is predictive of plural, even when phonological features are considered. 


5 Gender systems 
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Figure 5.11: Clustering analysis of singular-plural class in Romanian 


Table 5.22: Overall statistics for the heat map in Table 5.12 


Overall statistics: 


Accuracy : 0.5546 
95% CI : (0.5488, 0.5605) 
No Information Rate : 0.1062 
Kappa : 0.5092 
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Figure 5.12: Heat map for the model predicting inflection class by gen- 
der in Romanian 


Most importantly, we do not see any evidence for a flat inflection class hierarchy, 
nor for the more symmetric hierarchy in Table 5.5 or the simpler hierarchy in 
Table 5.7, but we do see evidence for the hierarchy in Table 5.6, where inflection 
classes are partially conditioned by their gender alignment. 


5.3 Interim conclusion 


In this chapter I have presented two cases of gender-inflection class interactions, 
namely nouns from the Latin third declension and Romanian nouns. In Latin, we 
saw a relatively simple system where syncretisms in the inflection of nouns are 
conditioned by their gender. The Latin case could be modeled with a very simple 
tree clearly reflected in the analogical system. The nouns in Romanian presented 
a much more complex interaction between gender and inflection class. Therefore, 
a much more elaborate hierarchy had to be postulated. Still, I showed that the 
analogical model was helpful in distinguishing between the three alternatives. 

With regards to the overall question of this book, in both cases we clearly saw 
that there are reflexes of the hierarchical structure in the analogical relations 
between the different classes. 
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This short chapter explores the situations where morphology allows for two dif- 
ferent, competing strategies to be applied to the same lexeme. In inflection, this is 
called overabundance, while in derivation the phenomenon is called derivational 
doublets. These systems can be modelled using hybrid classes, like the ones intro- 
duced in Chapter 3. As for the test cases, I look at Russian diminutives (deriva- 
tional doublets) and Croatian instrumental singular markers (overabundance). 
The interesting aspect of these two cases is that there is partial overlap between 
class membership, which contrasts with the previous gender and inflection class 
cases, where a stem or word can only belong to one inflection class at a time. 
In cases of affix competition and overabundance, a single lexeme can belong to 
two different classes at the same time, and giving speakers a choice between two 
formally different, but semantically identical, markers. This produces hierarchies 
with a different shape, which has a clear effect on the analogical relations. 


6.1 Overabundant inflection: Croatian singular 
instrumental 


In Bcs (Bosnian, Croatian and Serbian), a number of masculine nouns belonging 
to the first (or -a) declension present partial overabundance between the markers 
-em (/jem/) and -om (/om/) in the instrumental singular, as shown in (1): 


(1) a. grad-om 'city'-INSTR 
b. muz-em man -INSTR 
c. princ-om/princ-em ‘prince’-INSTR 
Importantly, not all nouns can alternate between the two markers: 
(2 a. kej-om ‘river bank'-INSTR 
b. * kej-em 
c. muzZ-em ‘man’-INSTR 


d. *muz-om 


6 Hybrid classes 


A rule of thumb for class assignment has been proposed in the literature al- 
ready: ^nouns ending in a palatal phoneme use -em, whereas all other nouns use 
-om. However, although this rule seems reasonably straightforward, there are 
some environments where doublets occur” (Lečić 2015: 377). Diachronically, this 
overabundance (Thornton 2011; 2010a), emerged due to the collapse of an older 
palatal /r//, which justified the use of -em, with a modern non-palatal /r/ which 
justifies the use of -om (Lečić 2015). Mlađenović (1977) (as cited by Lečić 2015) 
claims that -om is spreading to contexts where -em would be historically used. 

Some modern grammars give extremely general descriptions of this alterna- 
tion: "the masculine-neuter ending -om appears as -em after 'soft' consonants" 
(Alexander 2006: 85).! Similarly, other grammars seem to suggest that the alter- 
nation is purely phonological: "Stems ending in a palatal cause vowel alternation 
in the instrumental singular ending, e.g. učenikom ‘pupil’ - prijateljem ‘friend’..” 
(Kordié 1997: 12). Yet other works argue that the distinction between -em and - 
om is completely predictable from whether the noun ends in a hard (-om) or soft 
consonant (-em) (Hammond 2005: 146). Additional phonological environments 
of this alternation have been noted already: 


Instrumental -er / -em is normal with stems in -c, where vocative has 
-e/-e and the first-palatalization alternation, as ótac/oram 'father', voca- 
tive oce/oue. -or/-oM tends to be kept in foreign words and names 
(Kís-or/Kitur-ow) and in words with e in the preceding syllable: padež- 
oT/riayrexx-oM case, (Brown 1993: 320) 


As can be seen the idea that analogical relations help predict this particular 
alternation is not really new. However, Lečić (2015) convincingly shows that for 
the majority of the proposed prescriptive rules of where and when to use -em 
or -om, exceptions can be found in a corpus. This essentially means that there is 
no obvious categorical rule that correctly predicts whether a noun will take -em, 
-om or both. Secondly, and more importantly, the author shows that overabun- 
dant nouns, even when very infrequent with one of the two forms, are acceptable 
for speakers, whereas non-overabundant nouns (according to the corpus) are ac- 
ceptable with only one of the two forms. This strongly indicates that there really 
are three classes of nouns in Bcs: -om nouns, -em nouns and -om/-em nouns. 


6.1.1 Modelling the system 


One approach how to model overabundance with type hierarchies can be 
achieved by employing hybridization (Guzmán Naranjo & Bonami 2016). Hy- 
bridization assumes that there are two basic types, exclusively-em and exclusively- 


"The soft consonants in Croatian are: č, $, Z, 6 d, dz, nj, Ij, j, c. 
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om. Nouns of type exclusively-em can only take the marker -em and similarly for 
exclusively-om. Nouns of a hybrid type em-om can take either -em or -om. This 
hybrid type is related to the other two types as shown in Figure 6.1. 


sg.ins 
om em 
excl-em em-om excl-om 
muž kazalište ` princ car kej konobar 


‘man’ ‘theater’ ‘prince’ ‘emperor’ ‘river bank’ ‘waiter’ 


Figure 6.1: Hybridization in Bcs nouns 


In the present approach there are no constraints being inherited in the hier- 
archy in Figure 6.1, the approach at hand simply organizes nouns according to 
the markers they can take in the instrumental singular. Relevant constructions 
or rules would then introduce the appropriate markers for each type. This can be 
illustrated schematically in (3): 


(3 a. 
b. [STEM(X?")-om] > [SEM(X) + Inst + Sg] 


Because lexemes like princ and car belong to types om and em, both construc- 
tions can apply to them. Other implementations are possible, of course, but the 
important point is that the hierarchy in Figure 6.1 expresses that nouns that can 
take both markers share properties with those that can take only one. 

A complex issue that arises with hybrid hierarchies is what happens to the 
analogical filters in such cases. The analogical function for some leaf type con- 
tains all the generalizations, as well as exceptions, that determine whether any 
given lexeme belongs to said type or not. In terms of the model of analogy as a 
type constraint, the type em-om inherits all analogical constraints from em and 
om as: [em-om PHON] = [em PHON] ^ [om PHON]. This means that nouns em-om 
will end up looking like nouns from the classes excl-om and excl-em, because they 
must satisfy the same constraints. 

The prediction this approach makes is that we expect the confusion between 
em-om and each of the two exclusive classes excl-em and excl-om to be relatively 
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higher than the confusion between the two exclusive classes (excl-em and excl- 
om) themselves. 


6.1.2 Materials 


I extracted all 13227319 instances of instrumental singular nouns from the Web 
Corpus of Croatian, Bosnian and Serbian (Ljubesi¢ & Klubicka 2014) (1.9 billion 
tokens). From these, 6575746 are masculine nouns. After removing clear mistakes 
(punctuation marks, etc.), the total number of types was 227263. The number of 
types which appeared with either -em or -om, or both was 186443. The final data- 
set (after removing cases that appeared with multiple spellings) contained 180987 
nouns, with 39245 nouns (22%) taking -em, 137290 nouns (76%) taking -om, and 
4452 nouns (2%) taking both markers. Because we cannot know from a corpus 
whether a noun is not over abundant, there is always the risk of have many false 
negatives, particularly in the lower frequency cases (since it is possible that they 
are overabundant but there were not enough cases in the corpus for it to appear 
with both markers). There is a large imbalance in the type frequency of each of 
the three classes. I only used -em nouns with a frequency of more than 60, -om 
nouns with a frequency of more than 500, and -em/-om nouns with a frequency 
of more than 100 to address both problems. This process produces a dataset with 
3138 -om nouns, 2056 -em nouns and 1293 -em/-om nouns. These numbers are 
somewhat arbitrary, but they produce a more balanced sample and help control 
for false negatives. By selecting only the more frequent nouns, there is a higher 
probability that the class assignment is correct. 

I take the stem of the nouns to be the instrumental singular minus the -em or 
-om endings. I performed no orthography corrections for this data-set. 


6.1.3 Results 


The model is rather simple for this case. The predictors are the last two segments 
and the number of consonant clusters in the noun: class ~ final.1 + final.2 
+ n cluster?, The results of the model can be seen in Table 6.1 and the corre- 
sponding statistics in Table 6.2. 

We see that the model does predict fairly well the declension class of these 
nouns, although it makes a large number of mistakes when predicting -em/-om. 
Most relevant here is the degree of confusion between the three classes. It can be 
observed that excl-om and excl-em have a small rate of confusion between them. 
The greater amount of confusion is between em~om and excl-em, and em- om and 
excl-om. This is shown more clearly in Figure 6.3. The Y axis shows the percentage 


"This model had one hidden layer with 5 nodes and a decay rate of 0.01. 
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Table 6.1: Confusion Matrix for the model predicting instrumental sin- 
gular in Croatian nouns 


Reference 


Prediction em em~om om 


em 1887 445 25 
em-om 147 502 188 
om 22 346 2925 


Table 6.2: Overall statistics for Confusion Matrix Table 6.1 


Overall statistics: 


Accuracy : 0.8192 
95% CI : (0.8096, 0.8285) 
No Information Rate : 0.4873 
Kappa : 0.7051 


e accuracy = kappa 
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Figure 6.2: Additive (left) and subtractive (right) accuracy and kappa 
scores for for the model predicting Croatian instrumental 
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of predicted classes for each class. We see here that excl-em is rarely predicted 
to be excl-om and viceversa. Meanwhile, em~om is often predicted to be excl-em 
and excl-om. 
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Figure 6.3: Proportion of confusion between classes in Croatian 


This distribution is exactly what the model predicts, but it also makes sense 
from a historical perspective. As mentioned above, it was em nouns which lost 
their distinctive r which started taking om. That is, only when a certain set 
of em nouns started phonological shapes which would fit the om class, did this 
nouns started being overabundant. We have thus a system that went from be- 
ing perfectly predictable (as already mentioned in simple cases of phonological 
conditioned allomorphy) to overabundance. 


6.2 Frequency and analogical similarity: Russian 
diminutives 
6.2.1 Russian diminutives 


Nouns in Russian? can form the diminutive with a wide range of different suffixes. 
Some examples of diminutive suffixes are shown in (4). 


3A previous version of the study in this section was presented in Olinco 2016 (Guzman Naranjo 
& Pyatigorskaya 2016). 

‘For clarity I will use the Latin transliterations in the examples, but the models were trained 
using their Cyrillic forms. 
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(4) -jets (en), -ik (ex), -jok (-&x), -ochik (-ouex), -jechik (-euex), -jochik (- 
éuek), -itsa (ura), -ichka (-muxa), -iko (-uxo), -ko (-Ko), -jetso (-e1fo), -tso 
(Eno), -tse (-11e), -ik (-uk), -ok (ox), -chik (-unk) 


The choice of suffix is partly due to the gender of the noun (Kempe & Brooks 
2001; Kempe et al. 2003), as described in (5), but not completely. There are several 
possible diminutive forms for each gender, and the explanation why some nouns 
chose one or the other is not completely clear. As the whole system is too complex 
to be addressed here, I will focus on masculine nouns that build the diminutive 
with -ik, -ok, or -chik exclusively. 


(5) a. -iko, -ko, -tso, -tse — neuter nouns 
b. -itsa, -ichka — feminine nouns 


C. -ik, -ok, -chik — masculine nouns 


In the masculine subset: —ik, —ok, —chik, we find a particularly complex affix 
competition problem. Example (6) illustrates some nouns which can only appear 
with one of the three forms, while (7) shows the nouns which occur with two of 
the three different markers. 


(6) a -ik, *-chik, *-ok: stol ‘table’, kot ‘cat’, miaé "ball 
b. *-ik, -chik, *-ok: zabor ‘fence’ 


*-ik, *-chik, ok: molot ‘hammer’, vjechjer ‘vening’ 


o 


a. -ik, -chik, *-ok: stul “chair”, shkaf ‘cabinet’ 
b. *-ik, -chik, -ok: rukav ‘sleeve’ 

C. -ik, *-chik, -ok: rot ‘mouth’, list leaf”, chas hour” 
d. -ik, -chik, -ok: ? 


(7) 


A similar situation can be found in German, where there is competition be- 
tween the diminutive forms -chen and -lein, with some degree of overlap between 
the two: Háuschen-Háuslein ‘small house’. Similarly, Spanish has the forms -illo, 
-ito, -cito, -ico, among others, and some overlap between these forms in a substan- 
tial set of nouns: casita ‘small house’, pollito ‘chick’, gatito- gatico ‘kitten’. 

For Russian, there seems to be no rule-based account of which nouns can take 
which markers (Gouskova et al. 2015). Some research on Russian diminutives has 
focused on the relation between the different forms and gender, as well as gender 
acquisition (Kempe et al. 2010; Protassova & Voeikova 2007; Voeykova 1998), but 
relatively little attention has been given to the actual conditions that help decide 
between the different forms. Gouskova et al. (2015) is the most recent approach 
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to this problem. The authors propose what they label sublexical phonotactics, a 
model that is very similar to an analogical model. The basic idea is that each form 
a speaker encounters is stored in a sublexicon specific to that form, i.e. -ik forms 
are stored in an ik sublexicon, -chik forms are stored in an chik sublexicon and 
so on. Speakers find phonotactic regularities in each sublexicon, and new items 
are coined based on those regularities. Conceptually, there are only a few minor 
differences between traditional analogical models and Gouskova et al.'s (2015), 
but in terms of implementation some issues with the latter exist. From a theo- 
retical perspective, Gouskova et al. (2015) propose a flat model, where speakers 
simply have lists for each type, and there is no structuring of said types. This is 
quite common in analogical approaches. 

Because the issues with Gouskova et al.'s (2015) approach are of secondary 
concern, I will only discuss them briefly. The essence of their implementation is 
as follows. Using the UCLA phonotactis learner (Hayes & Wilson 2008), a first 
instance of their model infers phonotactic regularities in a dataset of Russian 
nouns marked for their diminutive preference (the phonotactic regularities are 
inferred from the bases of the nouns). In a second instance, a mixed effects model 
is trained on the phonotactic generalizations to determine which are statistically 
significant. 

There are several problems with this method. Of some real concern is that 
there is no cross-validation. Their results stem from testing the model on the 
same dataset it was trained on; however, this could be solved. It is also some- 
what unclear what the purpose of the mixed effect model is, because the UCLA 
learner is already predicting classes based on the phonotactic patterns. Finding 
statistically significant patterns is problematic because these patterns are highly 
correlated with each other (because of phonotactics), and mixed effects models 
are not robust against co-linearity, which means any statistical significance is in 
question.” 

The results in Table 6.3 can be obtained by looking at the predictions made 
by the UCLA learner vs observed diminutives in the dataset. The corresponding 
statistics are in Table 6.4. It is clear that the UCLA model is not learning to prop- 
erly discriminate the diminutive classes, and the model performs at chance level. 


`I could not reproduce their results in order to check because the versions of the statistical 
software used by the authors are no longer supported. 

*The original dataset and code used by Gouskova et al. (2015) were kindly provided to me by 
the lead author. I use the results from their code. 
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Table 6.3: Actual results for Gouskova et al.'s (2015) model 


Predicted 
Reference chik ik ok 
chik 135 54 76 
ik 133 54 65 
ok 122 56 75 


Table 6.4: Overall statistics for Confusion Matrix Table 6.3 


Overall statistics: 


Accuracy : 0.3429 
95% CI : (0.3093, 0.3776) 
No Information Rate : 0.5065 
Kappa : 0.01 


In one of the tests, the authors try to evaluate their model on a wug experi- 
ment with Russian speakers. They designed 300 nonce words and asked speakers 
to produce the corresponding diminutives. The authors claim high correlation be- 
tween their model and speaker choices but it is not clear how this correlation was 
measured. First, their evaluation is made by using Kendall's and Spearman's cor- 
relation coefficients, which would lead to believe that they are testing predicted 
proportions vs produced, but this is not made clear. In the code the correlation 
calculations are made on categorical variables, which is not advisable and makes 
the results hard to interpret. 

Despite these potential issues, the basic idea that the affix competition in Rus- 
sian diminutives is resolved analogically is on the right track. In the end, it is not 
of too much interest whether the phonotactics approach could outperform the 
neural networks I employ here, or the other way around. The important question 
that is left to be addressed is whether a flat list approach like that of Gouskova 
et al.'s (2015) is more appropriate than a structured model. 


6.2.2 Modelling the system 


Conceptually, if one ignores semantics and stress assignment (which also seem to 
have no straightforward solution according to Gouskova et al. 2015), it is possible 


111 


6 Hybrid classes 


to capture the system with a cross-classification approach similar to the Croatian 
system. 

The hierarchy in Figure 6.4 shows an simple sketch of how the system can be 
captured. Figure 6.4 shows that all pairwise combinations are possible." 


DIM 


IK CHIK OK 


ik ik~chik chik ik-chik-ok ok ik-ok chik-ok 
Figure 6.4: Hybridization in Russian nouns 


Since all combinations are possible, we expect confusion between all three 
types. However, the frequency at which the combinations occur is not uniform. 
Figure 6.5 shows that the most frequent classes are the non alternating classes 
and the ik-ok class (see the next section for details on the dataset used). This 
case is thus interesting because it shows the effects of type frequency. If type 
frequency plays no role, then we would expect that the confusion between ik, ok 
and chik should be more or less equal. If, on the other hand, type frequency does 
play a role, we expect confusion between all classes, with the highest confusion 
between ik, ok and ik-ok, while the lowest confusion should be between chik and 
ik-ok (since this combination was not attested). 


6.2.3 The dataset 


I used the diminutive dataset collected by Gouskova et al. (2015), but hand 
checked them with a native speaker of Russian. The original dataset was ex- 
tracted from the google-ngram corpus (Michelet al. 2011) and contains 1367 forms. 
Since there are no Russian taggers which can identify diminutives, the authors 
relied exclusively on the endings of the words to find diminutive forms. This 
caused the dataset to have many problematic cases. To solve this, we removed 
errors (perceived to be ungrammatical by a few informants), non-diminutives 


7I am not aware of cases where all three suffixes are possible with one noun and could not elicit 
any from my informants. 

*This part of the work would not have been possible without the invaluable help of Elena Py- 
atigorskaya, who manually checked and corrected the whole-data set. 


112 


6.2 Frequency and analogical similarity: Russian diminutives 
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Figure 6.5: Type frequency of Russian suffix classes 


(e.g. the word alkogol-ik 'alcoholic' has an -ik suffix but is not actually a diminu- 
tive) and non-words. This left us with 821 diminutives. My informant provided 
stress marks for the bases of the selected diminutives. 

Because there are not enough cases for the classes chik-ok (f=7) and chik~ik 
(f=8), I removed them from the data set for fitting the models.’ The final dataset 
had a total of 811 nouns. 


6.2.4 Results 


To predict the diminutive forms, I fitted a model using the formula: diminutive 
^ final.1 + final.2 + length letters * n vowels + stress position * 
stressed vowel. Basically, this model looks at the final two segments of the 
base nouns (in the nominative), the interaction between the length of the base 
and the number of vowels of the base, the position of the stress in the word 
(counting from the right) and the stressed vowel. The results can be seen in 
Table 6.5 and the corresponding statistics in Table 6.6. The relative importance 
of the predictors is shown in Figure 6.6. 

First of all, the model is very accurate overall and well above random chance. 
All four classes can be distinguished to a certain degree. The most important fac- 
tor is the last segment, but the other factors all seem to have an important effect. 


"These two classes were not predictable at all if included. It is hard to tell whether a model with 
more examples would perform better in these two cases. 
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Table 6.5: Confusion Matrix for the model predicting Russian diminu- 
tives 


Reference 


Prediction chik ik ik-ok ok 


chik — 177 11 1 13 
ik 8 175 28 33 
ik-ok 1 22 34 18 
ok 13 30 15 232 


Table 6.6: Overall statistics for Confusion Matrix Table 6.5 


Overall statistics: 


Accuracy : 0.762 
95% CI : (0.7312, 0.791) 
No Information Rate : 0.365 
Kappa : 0.6654 


e accuracy = kappa 
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Figure 6.6: Additive (left) and sbustractive (right) accuracy and kappa 
scores for the model predicting Russian diminutives 
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6.3 Interim conclusion 


More importantly, we find the exact predicted result in the error distribution. 
The class ik~ok is less confused with the class chik than with any of the other 
classes. Class chik is rarely confused with classes ik and ok, while these two are 
confused with each other with relatively high frequency. This is clearly shown 
in Figure 6.7. 


chik 
ok 
0.75- ik 
Kal 
2 
3 
Kal 
L 0.50- 
e 
5 ik-ok 
o ik 
LE 
Q 
Doze. 
ok 
ok ik 
ik-ok el 
qe chik Kë 
0.00 - ik-ok chik 
chik ik ik-ok ok 


form 


Figure 6.7: Proportion of confusion between classes in Russian 


There are two ways how to interpret these findings. Firstly, in one scenario, it 
could be postulated that there is a need for quantitative information to be hard 
coded into the hierarchy, i.e. we should assign stronger connections to IK and 
OK, than to the other combinations. I propose, however, that this addresses the 
problem backwards. The more straightforward alternative is to see the higher 
type frequency of ik-ok as a byproduct of the analogical system itself, and not 
as something one has to directly integrate into the model. The fact that we have 
more ik-ok nouns than chik- ik or chik-ok nouns is due to the constraints for IK 
and OK being more compatible with each other, and producing a more relaxed 
set of constraints than CHIK-IK or CHIK- OK. 


6.3 Interim conclusion 


I have shown how the hybridization model can properly predict the distribu- 
tion of both partially overlapping cases in Croatian, and overlapping diminutives 
with different type frequencies in Russian. These two examples show that the 
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predicted effects are not only present in simple trees, but can be also observed 
in more complex hierarchical structures. These results clearly reject the flat list 
approach and support a structured organization of these systems. It is also inter- 
esting to see that, despite the fact that one case is inflectional morphology and 
the other derivational morphology, the results for both studies are very similar in 
terms of the distribution of errors and the analogical relations. This result argues 
for an organization of the stems in classes independent of the type of morpholog- 
ical process, at least for morphological theories that make a distinction between 
inflection and derivation. So, even if overabundance in inflection and affix com- 
petition in derivation are treated as different kind of phenomena, the underlying 
structures would be equivalent. 
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So far we have seen how the analogical relations between nouns reflect the gram- 
matical structuring and type system of the lexicon. A common trait in the pre- 
vious cases is that the morphological markers have all been suffixes. We also 
saw that it was only the ending of the stems (and some additional phonological 
information like the number of syllables and stress placement) that helped as 
predictors. This kind of correlation is often found in the literature on phonologi- 
cally conditioned morphology and analogy in general. There are only a handful 
of studies in which the beginning of words were found to have a conditioning 
effect on some morphological process (Bybee & Slobin 1982; Kópcke & Zubin 
1984), and studies that examine prefixes are even rarer. 

Some well-known phenomena in phono-syntax suggest that this relation 
might not be coincidental. The choice between a and an in English, or the choice 
between la and el in Spanish (in Spanish feminine nouns can use the masculine 
definite article el if they begin with a stressed /a/, see Harris 1987), are condi- 
tioned by the first segment of the following word. This makes intuitive sense, 
but it is not obvious why it should be the case. It would be perfectly possible that 
suffix selection depended on the first segment of the stem, or the second vowel, 
etc. 

To explore this question I look at three different phenomena in this chapter: 
Swahili noun classes, Otomi verb classes and Hausa plurals. Swahili and Otomi 
are relevant to the overall question of this chapter because they use prefixes 
instead of suffixes, and Hausa has complex plural formations. 


7.1 Prefixes and gender: Swahili noun classes 


Swahili, like other Bantu languages, has a noun class system in which all nouns 
belong to a specific, partially conditioned, class. Traditional Swahili grammars 
list eleven main classes for Swahili nouns, which are presented in Table 7.1.! These 
classes are defined by a prefix on the noun and can mark either singular or plural. 


1] have omitted classes 14 (abstractions), 15 (verbal infinitives) and 16-18 (locatives). For classes 
9 and 10, N represents three possible markers: n-, ny- or m-. 


7 Morphological processes and analogy 


For the most part, noun classes are lexically determined, with a few classes being 
determined by derivational morphemes (diminutives, etc.). 


Table 7.1: Swahili noun classes 


clas form number 
1 m- singular 
2 wa- plural 
3 m- singular 
4 mi- plural 
5 D-ji- singular 
6 ma- plural 
7 ki- singular 
8 vi- plural 
9 N- singular 
10 N- plural 
11 u- singular 


Corbett (1991), however, suggests that Swahili noun classes should be treated 
as genders, not very differently from other gender systems. The reason is that all 
the properties of a gender system are present in the Swahili class system, like 
agreement with determiners and adjectives as shown in (1).? 


(1 ki-kapy  ki-kubwa ki-moja ki-lianguka 
CL7-basket cL7-large cL7-one c17-fell 
“One large basket fell. 


The class marker ki agrees with the verb, noun, adjective and determiner, just 
like German adjectives agree with their nouns. The fact that these are genders 
can be seen more clearly from cases where the prefix on a noun is ‘wrong’, in the 
sense that it usually denotes some other class than what it is actually agreeing 
with. In (2b) (Corbett 1991: 45) we see for example (a) that tu ‘person’ takes a 
marker for class 1, while the agreement with the verb is the marker of class 2. A 
similar situation arises in example (b) where there seems to be a disagreement 
between the different markers. For this reason Corbett (1991) argues that there 
are two different system: inflection class and gender proper. 


“The examples in this section are taken from Corbett (1991), who in turn takes them from 
Welmers (1973: 159-183). 
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(2 a. m-tu wa-mepotea 
CLI-person CL2-is.missing 
‘A parson is missing: 
b. ki-faru m-dogo  wa-likuwa hapa 
CL7-rhinoceros cL1-smallcr2-was here 


“A small rhinoceros was here” 


Thus, grouping the singular and plural forms we get the six genders (the orig- 
inal proposal in Corbett (1991: 47) suggests seven) in Table 7.2. 


Table 7.2: Swahili genders 


Class Prefix on noun Verbal agreement 


1/2 m-/wa- a-/wa- 
3/4 m-/mi- u-/i- 
5/6 Q - ji-/ma- li-/ya- 
7/8 ki-/vi- ki-/vi- 
9/0 N-/N- i-/zi- 
11/10  u-/N- u-/Zi- 


Swahili has received some attention with respect to how nouns are assigned to 
a given gender. Corbett (1991: 47) suggests that “for Swahili we require semantic 
and morphological assignment rules”. The author lists (p. 47) the following rules 
(adapted) to account for how nouns are assigned to their gender class in Swahili. 
When in conflict, the semantic rules override the morphological rules: 


Semantic assignment: 


1. augmentatives belong to gender 5/6 
2. diminutives belong to gender 7/8 


3. remaining animates belong to gender 1/2 
Morphological assignment: 


1. morphological class 3/4 (m-/mi-) — gender 3/4 


2. morphological class 5/6 (Ø ~ ji-/ma-) — gender 5/6 
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3. morphological class 7/8 (ki-/vi-) — gender 7/8 
4. morphological class 9/10 (N-/N-) — gender 9/10 


5. morphological class 11/10 (u-/N-) — gender 11/10 


Corbett (1991: 48) also provides some additional semantic regularities: plants 
are often in gender 3/4, fruits in gender 5/6, animals in gender 9/10 and small 
objects in gender 7/8. This list is further expanded by Contini-Morava (1994), 
who provides strong additional semantic grounding for most of the six genders. 

With all these rules combined, we have a system where we expect that phono- 
logical analogies will be rather weak. Because of its heavy semantic component, 
and because speakers are usually quite certain with regards to inflectional class 
assignment upon encountering a noun, the need for analogical relations is greatly 
reduced. 


7.1.1 Materials 


I compiled a list of Swahili nouns with their corresponding classes by combin- 
ing the list given in the Wiktionary page for Swahili (Wikimedia Foundation 
2019), and extracting all the nouns for which class information is available in the 
Mgombato: Digo-English-Swahili Dictionary (Mwalonya et al. 2004). Because the 
extraction from the Swahili dictionary relied on optical character recognition, 
there is some degree of noise in the data. I removed all clear errors of nouns 
containing punctuation marks. The result is 3081 nouns, distributed as shown in 
Figure 7.1. There were not enough u- marked nouns to properly work with the 
11/10 gender. 

Because the classes are uneven in terms of members, models including the 
whole data-set tended to under-perform.? To control for this, I randomly ex- 
tracted 378 nouns for each class (the size of the smallest class in the original 
data-set). This produced a final data-set with 1890 nouns in total. 

In terms of pre-processing, Swahili has a series of digraphs (e.g. mb — /™b/), 
which I converted into single character representations to aid the analogical 
model. Otherwise, this is a relatively poor data-set in terms of features. We do 
not have any extra semantic or morphological information to aid the models. 


>The reason is that the neural network models are sensitive to type frequency. This is not very 
important if the predictors are strong enough, but in cases where the predictors are weak, the 
model tries to optimize for general accuracy, and over-predicts the most frequent class. 
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Figure 7.1: Type frequency of Swahili genders 


7.1.2 Results 


In our first model we investigate whether the first and second segments of the 
stem (that is, after removing the class prefixes) can predict to any degree the 
inflectional class of Swahili nouns with the model class ~ first.1 + first.2.4 
The results, shown in Table 7.3 and Table 7.4, are not very good in themselves. 
The accuracy is barely above chance, and the kappa score is very small. This 
basically means that there is very little information about inflection class just 
in the phonological shape of the stem. But this result is not really surprising. 
Swahili speakers encounter nouns with the prefix or some agreeing forms, and 
there is little ambiguity about their class. 

Next, we compare this model to one where we use the endings of the nouns 
instead of the initial segments, as shown in Table 7.5. In this model we see per- 
formance at chance level. 

Finally, we try a model that combines the first two segments of the noun, 
the last segment, and length in letters with the formula: class ~ final.1 + 
first.1 + first.2 + length. The results are presented in Table 7.7 and Ta- 
ble 7.8. This model shows a slight improvement from the model only using the 
first segments. 

The overall evaluation of this final model can be seen in Figure 7.2. This figure 
basically shows that the main effect comes from the first segment, but that the 
other factors still play a minor role. 


“With 0 hidden nodes and a decay rate of 0.1. A more complex model with interactions did not 
perform any better. 
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Table 7.3: Confusion Matrix for the model predicting inflection class of 
Swahili nouns 


Reference 


Prediction 1-2 3-4 5-6 7-8 9-10 
1-2 155 96 47 69 46 
3-4 85 130 48 78 63 
5-6 44 49 168 84 74 
7-8 44 53 46 92 49 
9-10 50 50 69 55 146 


Table 7.4: Overall statistics for Confusion Matrix in Table 7.3 


Overall statistics: 


Accuracy: 0.3656 
95% CI: (0.3439, 0.3878) 
No Information Rate: 0.2 
Kappa: 0.2 


Table 7.5: Confusion Matrix for the model predicting inflection class of 
Swahili nouns 


Reference 


Prediction 1-2 3-4 5-6 7-8 9-10 
1-2 195 94 92 89 102 
3-4 35 91 71 79 43 
5-6 32 49 54 40 58 
7-8 31 68 67 91 54 
9-10 85 76 94 79 121 
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Table 7.6: Overall statistics for Confusion Matrix in Table 7.5 


Overall statistics: 


Accuracy: 0.2921 
957; CI: (0.2716, 0.3131) 
No Information Rate: 0.2 
Kappa: 0.1151 


Table 7.7: Confusion Matrix for the model predicting inflection class of 
Swahili nouns 


Reference 


Prediction 1-2 3-4 5-6 7-8 9-10 
1-2 178 83 49 42 73 
3-4 68 158 47 86 60 
5-6 44 43 164 91 58 
7-8 25 55 56 105 40 
9-10 63 39 62 54 147 


Table 7.8: Overall statistics for Confusion Matrix in Table 7.7 


Overall statistics: 


Accuracy: 0.3979 
95% CI: (0.3757, 0.4204) 
No Information Rate: 0.2 
Kappa: 0.2474 
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e accuracy = kappa 
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Figure 7.2: Additive (left) and subtractive (right) accuracy and kappa 
scores for the model predicting gender in Swahili 


The model including both beginning and ending of the nouns clearly per- 
formed better, and even though the main effect came from the beginning of the 
nouns, the ending did play a role. 

It is possible that the current analogical relations of the Swahili noun classes 
are the product of some previous more regular system (Nurse & Hinnebusch 
1993), and not of actual productive schemas speakers use. Because the analogical 
effects are so weak, the most likely explanation in this case is that the semantic 
component is much stronger, and thus phonological analogy is not as important 
for speakers. The important point here is that we do see a stronger effect of the 
beginning of the stem than of the ending of the stem. 


7.2 Prefixes and inflection classes: Eastern Highland 
Otomi 
7.2.1 Verb classes in Eastern Highland Otomi 


Eastern Highland Otomi (Otomi from now on) is a Mesoamerican language of 
the Otomanguean family spoken in Mexico (Echegoyen & Voigtlander 1979). The 
Otomi verb system is relevant for the proposal in this book because, like in 
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Swahili, it has inflection classes where the actual inflection is produced by a pre- 
fix instead of a suffix. 

The verbs are organized in four classes according to Echegoyen & Voigtlander 
(1979), and five classes according to Feist & Palancar (2015). Examples of these 
classes can be found in Table 7.9. 


Table 7.9: Otomi inflection classes 


Class La Class I.b Class II Class III Class IV 
‘gather’ ‘save’ ‘walk’ ‘fix’ ‘hurry’ 
Incompletive 1st ` díjoni dí -n yàni dí "yo dí -dí hoki dí -dí xoni 
2nd  gíjoni gí -n yàni gí vo gí -dí hoki gi -dí xoni 
3rd  (i)joni i -n yáni (i) 'yo (i) -di hoki (i) -di xoni 
Imperfect Jet ` dmíjoni dmí-nyàni dmi’yo dmí-díhoki dmí -dí xøni 
2nd  gmíjoni gmí-nyäni gmí’yo gmí -dí hoki gmí -dí xøni 
3rd mí joni mí -n yäni ` mi vo mí -dí hoki mí -dí xøni 
Completive Ist dá joni da yani dá -n vo dá hoki dá -n xoni 
2nd gá joni gá yäni gá -n vo gá hoki gá -n xøni 
3rd bi goni bi yäni bi -n 'yo bi hoki bi -n xøni 
Perfect Jet — xtájoni xtá yàni xtá-n'yo  xtá hoki xtá -n xoni 
2nd xká joni xká yäni xká-n'yo xká hoki xká -n xøni 
3rd  xe-ngoni xø -n yani xo-n'yo xø hoki xø -n xøni 
Pluperfect Ist — xtájoni xtá yani xtá-n'yo  xtá hoki xtá -n xoni 
2nd  xkíjoni xkí yàni xkí-n'yo ski hoki xkí -n xoni 
3rd xí goni xí yäni xí -n vo xí hoki xí -n xøni 
Irrealis Ist ga joni ga -n yáni da -n vo ga hoki da -n xeni 
2nd  gijoni gi -n yàni ga-n’yo gi hoki ga -n xoni 
3rd da goni da yäni di -n 'yo da hoki di -n xøni 


Capturing the class system in Otomi requires positing five independent types, 
but nonetheless there is a degree of organization between these types. The impor- 
tant thing to observe here is that classes I and IV share an extra -di- segment 
in the incompletive and imperfect, while classes I and II do not have this feature. 
Meanwhile, class II and class IV share the use of an extra -n in the completive, 
perfect, pluperfect and irrealis. Class La can either be grouped with classes I.b and 
Ill or as a completely independent class, depending on the property involved. 


7.2.2 Materials 


For this case study I used the inflection class database by Feist & Palancar (2015) 
(based on Echegoyen & Voigtlander 1979, Echegoyen & Voigtlander 2007 and 
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Voigtlander & Echegoyen 2007). This database contains 1998 verbs, all of which 
were analyzed and assigned to one of the five classes. It also contains information 
about whether the verb is transitive or not, its stem and citation form. I performed 
no extra processing on the data and used it as it was. 


7.2.3 Results 


In terms of complexity, the model for Otomi is probably the one with the most 
factors. As predictors, I included the first three segments (with an interaction be- 
tween the first and second segment), the last two segments, the tone of the cita- 
tion form, and whether the verb is transitive or not: class ~ first.1 * first.2 
+ first.3 + Transitivity + last.1 + last.2 + tone.” The confusion matrix 
for this model is shown in Table 7.10 and the accuracy measures in Table 7.11. 


Table 7.10: Confusion matrix for the model predicting inflection class 


in Otomi 
Reference 
Prediction Ia Ib I II IV 
Ja 609 6 46 141 56 
Ib 6 29 2 8 0 
II 50 2 284 27 85 
III 82 15 10 249 14 


IV 36 3 74 28 136 


We see that classes are mostly predictable for Otomi, but there is some degree 
of confusion. The accuracy metrics show that class-Ia is receiving most of the 
miss-classifications, which is to be expected, this being the most frequent class. 
Interestingly, class-Ib is only mildly confused with class-Ia, and much more con- 
fused with class-III. 

The important fact regarding Otomi is the relative effects of the different fac- 
tors. In Swahili we saw that both the first segments and final segment of the 
nouns carried some information about gender. In this case, we have more or less 
the same situation. Figure 7.3 shows the additive and subtractive model evalu- 
ation plots. On the left, we see that all factors used provide small increases to 
model performance. Moreover, on the right, we see that the two most important 
factors were the interaction between the first two segments of the verb and the 


"The model contained no hidden nodes and a decay rate of 0.1. 
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Table 7.11: Accuracy scores for Table 7.10 


Overall Statistics 


Accuracy : 0.6542 
95% CI : (0.6328, 0.675) 
No Information Rate : 0.3919 
Kappa : 0.5211 


Statistics by Class: 


Class: la Class: Ib Class: II Class: III Class: IV 
Sensitivity 0.7778 0.52727 0.6827 0.5497 0.46735 
Specificity 0.7951 0.99177 0.8963 0.9217 0.91740 
Neg Pred Value 0.8474 0.98669 0.9148 0.8747 0.90994 
Balanced Accuracy 0.7864 0.75952 0.7895 0.7357 0.69238 


verb’s transitivity. The interesting thing to note is that the first segments were 
much more important for predicting inflection class than the final segments. 


e accuracy = kappa 
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Figure 7.3: Additive (left) and subtractive (right) accuracy and kappa 
scores for the model predicting inflection class 


Once more, classes that trigger prefixing processes are predictable from analo- 
gies based on the beginning of words, much more so than analogies based on the 
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endings. The fact that the endings did play a minor role is interesting. It probably 
means that both Otomi and Swahili are susceptible to word size schemas, similar 
to how in German nouns gender is determined by both initial and final segments 
(Kópcke & Zubin 1984). 


7.3 Stem changing processes: Hausa plural classes 


7.3.1 The Hausa plural system 


The Hausa plural system is too complex to be fully explored here, but some of its 
properties are relevant to the overall theme of this chapter. First, there seems to 
be little agreement with regards to how many plural classes there are in Hausa, 
and an analysis could go anywhere between “many” (Migeod 1914), around thirty 
(Schón 1862), to twenty macro-classes (Newman 2000), or the many more sub- 
classes Newman identifies. For this study I follow the macro-classes defined by 
Newman (2000), which are given in Table 7.12. 

As we can observe in Table 7.12, some plural classes assign their own tonal 
pattern to the plural forms, independently of the tonal patterns of the singular, 
while others carry over the tonal pattern of the singular class (Newman 2000: 
430). There are several reduplication patterns, and several ‘broken’ plurals, where 
there is a vocalic change before and after the final consonant of the singular. It 
is worth keeping in mind that these are macro-classes, and one could find an 
even more fine-grained division, with many subdivisions within each of these 
classes. Because of this fact, there are no good arguments in favor of a specific 
hierarchical organization of these classes. 

Newman (2000: chapter 56) observes several regularities in the formation of 
plurals. He mentions, for example, that -aCe plurals only occur with CVCVV 
nouns, while a-a plurals tend to appear with CVCCVV nouns (p. 431). Newman 
gives similar patterns for other macro-classes, but states that ultimately Hausa 
plurals are not fully predictable. 


7.3.2 Materials 


I extracted all nouns from A Hausa-English Dictionary and English-Hausa Vocab- 
ulary by Bargery & Westermann (1951). The dictionary contains around 3000 


“Because the dictionary I use for the data (Bargery & Westermann 1951) does not distinguish 
between the retroflex and rolled r, and between long and short vowels, I will not mark these 
features here. For tone representation I follow Newman (2000), with high tone unmarked, low 
tone marked with a grave accent, and falling tone with a circumflex accent. 
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Table 7.12: Hausa plural macro-classes 


Class Singular Plural Gloss 

a-a sirdi siráda ‘saddle’ 

a-e gulbi gulàbe 'stream' 

a-u kurmi kuramu ‘grove’ 

-aCe wuri wurare ‘place’ 

-ai malám  malamai “teacher” 
-anni — watà watanni ‘moon’ 
-awa  talaka talakawa 'commoner' 
-aye zomo zomaye ‘hare’ 

-Ca tabo tabba ‘scar’ 

-Cai ` tudu tüddai ‘high ground’ 
-ce2 ciwó ciwace-ciwace ‘illness’ 
-Cuna ciki cikkuna ‘belly’ 

-e2 camfi camfe-camfe ‘superstition’ 
-i tàuraró ` tàuràri ‘star’ 

-oCi  tagá tagogi ‘window’ 

-u kujéra küjéru ‘chair’ 

u-a cokali cokud ‘spoon’ 
-uka  layi layuka ‘lane’ 

-una  riga rigund ‘grown’ 

X2 àkàwu  akawu-dkawu ‘clerk’ 


nouns, of which only some 1450 have a plural. Of these, quite a few have indi- 
cations about multiple alternatives. Some of the alternatives are marked as rare, 
or for regional preferences. It is not really possible to work with these overabun- 
dant variants (Migeod 1914; Salim 1981; Newman 2000) because there are just 
not enough of them (around 150). As a practical solution, I simply took the first 
variant offered in the dictionary and ignored the rest. Similarly, in cases where 
the dictionary offered multiple possible singulars for a noun, I only used the first 
singular form listed. 

Identifying plural classes automatically in Hausa is not a trivial task, and it is 
not completely clear how many examples fit into Newman’s (2000) macro-classes. 
I followed the definitions as given in Table 7.12. Although this approach is likely 
to produce some errors, it should mostly give us the right classification. The main 
difference in the classes I use is that I take four reduplication classes instead of 
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the three listed in Table 7.12: class-RED-e and class-RED-comp correspond to the 
class-X2 and class-e2 classes identified by Newman (2000). I included class-RED-id 
which consists of cases where the plural is the reduplication of the singular form 
without additional changes, and a general class-RED class with all the cases that 
do not quite fit into any of the other classes. The class-ce2 did not have enough 
members to be usable. Finally, an extra class I include is class-oi, which is not 
explicitly mentioned by Newman (2000), but which had enough members to be 
distinguished as an independent macro-class. We can see the frequencies of the 
classes in the data-set in Figure 7.4. 
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Figure 7.4: Type frequency of macro-classes in Hausa 
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As expected, some classes are considerably more frequent than others, and the 
general distribution is roughly zipfian. However, it is hard to tell which of these 
classes are productive, which are irregulars, and which misanalyses. 

A serious shortcoming of this database is the lack of information about vowel 
length. According to Newman (p.c.), several of the macro-classes are strongly 
correlated with vowel length of the singular, which means there is an important 
factor missing. 


7.3.3 Results 


First we look at a model predicting the plural class from structurally defined pre- 
dictors. Since most of the macro-classes presented in Figure 7.4 are defined by 
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two vowels and a potential consonant between them, I defined the predictors 
as follows: plural class ~ V.1*T.1 + C.1 + V.2 + CVCV.4 + length.” Here, 
V.1 and V.2 are the final and prefinal vowels, respectively, C.1 is the final conso- 
nant, T. 1 is the final tone of the singular, Length the length in letters, and CVCV. 4 
is the CV structure of the final four segments of the singular. In this case, we are 
specifying an interaction between the final vowel and the tone of that vowel. 
Newman (2000: chapter 56) makes reference to all these factors, in some way or 
another, in his analysis of the Hausa plurals. It is therefor no surprise that they 
play a role in the analogical model. 

The results of this model can be seen in Figure 7.5 and the corresponding statis- 
tics are presented in Table 7.13. We see that most classes can be predicted to a 
relatively high degree of accuracy. There is a clear darker trace along the main di- 
agonal in Figure 7.5, but with some noise for most classes.? In the table there are 
errors across most classes with no clear structure to them, besides some apparent 
foci (class—a-a, class—a-e, class—ai, class-Cai and class—oCi). The accuracy statis- 
tics do reveal that the model is performing well above chance, and that there is 
a significant analogical relation between these classes. 

For comparison, a model that does not specify structural analogy: plural 
class ~ final.1*T.1 + final.2 + final.3 + CVCV.4 + Length’, can be seen 
in Table 7.14. It is not surprising that this model also performs relatively well, 
after all, the predictor final.1 captures the same information as the predictor 
V.1. 


Table 7.13: Accuracy scores for Figure 7.5 


Overall Statistics 


Accuracy : 0.5425 
95% CI : (0.5161, 0.5686) 
No Information Rate : 0.2082 
Kappa : 0.488 


We can compare model performance for both models (Figure 7.6 and Figure 7.7). 
These evaluations reveal that indeed final .1 and V.1 have more or less the same 


"The model included one hidden layer with five nodes and a decay rate of 0.1. Gender did not 
play a significant role in any of the models. 

*Because the numbers used for shading are log scaled from the actual confusion matrix, the 
error rates appear slightly higher than they actually are. 

?The model included no hidden nodes and a decay rate of 0.1. 
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Figure 7.5: Heatmap for the model predicting plural forms in Hausa 


Table 7.14: Accuracy scores for the non-structurally defined model 


impact on the model, but for the non-structurally defined model all other pre- 
dictors become rather insignificant in the subtractive evaluation. The segments 
captured by both models are the same, but the additional structure does clearly 


play a role. 


We can also see that the more structural predictors not only achieve a higher 
accuracy, but also have more independent weights (higher in accuracy in the 
subtractive evaluations). The main factors are clearly the vowels (and their inter- 
action with tone), while the consonant has less influence. This strongly matches 
the broken plurals we see in Hausa, where the consonant remains stable and the 


Overall Statistics 


Accuracy : 0.5057 
95% CI : (0.5792, 0.5321) 
No Information Rate : 0.2082 
Kappa : 0.4516 


vowels before and after it are changed. 
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Figure 7.7: Additive (left) and subtractive (right) accuracy and kappa 
scores for the non-structurally defined model 
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7.4 Interim conclusion 


In this chapter I have provided some evidence for a different aspect of analogical 
models, namely the fact that the analogical specifications, or the points where 
the analogy takes place, can be related to the actual morphological process. In 
Swahili and Otomi we see that a prefixing system triggers analogy mostly at the 
beginning of words, and in Hausa we see how the analogical relation requires 
a specification that is similar to the actual structure of some plural classes. The 
results of this chapter should be taken only as a starting point. Two languages for 
prefixes is too small a sample to draw any definitive conclusions. As mentioned 
in Part I, this problem had already been raised before: 


The problem faced in the full elaboration of such models, however, is in 
specifying the relevant features upon which similarity is measured. This is 
a pressing empirical problem. We need to ask, why are the final consonants 
of the strong verbs more important than the initial ones? (Bybee 2010: 62) 


This observation is very difficult to explain from a formal perspective. Assuming 
the model introduced in Part I is right, there is no way for the hierarchy to ‘know’ 
what kind of morphological process is being carried out on the different types, 
and to link that to the inheritance of analogical constraints. From a usage-based 
perspective, however, these results make more sense. A potential explanation is 
that speakers are more focused on finding similarities between words where the 
important changes happen, i.e., the segments before a suffix or after a prefix. This 
would also explain why there seems to be a distance effect from the edge in most 
of the other languages, that is, the very last segment tends to be more important 
than the second to last and so on (though not always). A possible advantage of 
this explanation is that it also helps reduce the search space for speakers. Un- 
less there was some innate constraint that specified where to look for analogies, 
speakers would have to analogize over all segments of all stems. The fact that 
analogies seem to be mostly constrained to the edge of the stem where the mor- 
phological process happens, helps reduce the amount of information that has to 
be considered. This variability of the ‘where’ of the analogy is an advantage for 
speakers of the language and not a drawback. 
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So far I have only looked at systems with relatively few classes, and hierarchies 
with few types. This chapter looks at three examples where the systems are con- 
siderably larger, with many more inflection classes, and which require more com- 
plex type hierarchies. The main question here is what happens with the analog- 
ical relations, particularly the analogical similarities between classes, when the 
type hierarchies are made up of several interacting sub-trees. 


8.1 Multiple inheritance and cross-hierarchies: Spanish 
verbal inflection 


8.1.1 Spanish inflection classes 


In Spanish there are three clear inflectional classes given by the thematic vowel 
of the verb: -a(r) (e.g. cantar, ‘to sing’), -e(r) (e.g. correr ‘to run’) and -i(r) (e.g. 
reir ‘to laugh’), also referred to as first class, second class and third class, respec- 
tively. Depending on the variety, inflectional paradigms in Spanish consist of 
around 53 content cells, exemplified in Table 8.1 for amar ‘to love’. The 2P1 forms 
given in Table 8.1 are only found in Spain, with Latin American Spanish using 
the 3Pr form for the 2PL. Additionally, the future subjunctive is rare, and it is 
found mostly in fixed expressions: sea lo que fuere ‘whatever it may be’!. Finally, 
the imperfect subjunctive exhibits overabundance (Thornton 2010a,b) between 
-se and -ra, with both forms having exactly the same morphosyntactic content 
(Cuervo & Ahumada 1981; DeMello 1993; Kempas 2011; Rojo 2008; Rosemeyer & 
Schwenter 2019). 

The literature recognizes two macroclasses in the inflectional system of Span- 
ish based on their thematic vowel: verbs ending in -ar vs. verbs ending in -er or 
-ir (Aguirre & Dressler 2008 among many others). This distinction is easy to see 
from the partial inflectional paradigm of regular verbs in Table 8.2. The second 


‘Notice however that it is easy to find uses of this form online: Demos la vida si fuere nece- 
sario ‘let us give our lives if it should be needed’. http://portaluz.org/demos-la-vida-si-fuere- 
necesario-1570.htm, consulted 12-11-2016. 


8 Complex inflectional classes 


Table 8.1: Complete paradigm for amar ‘to love’ 


Indicative 
Present Imperfect Preterite Future 
IsG amo amaba amé amaré 
2sG amas amabas amaste amaras 
3sG ama amaba amó amará 
]PL amamos amábamos amamos X amaremos 
2PL  amáis amabais amásteis amaréis 
3pL aman amaban amaron amarán 
Conditional 
Iso  amaría 
2sG amarías 
3sG amaría 
]PL  amaríamos 
2PL  amaríais 
3PL  amarían 
Subjunctive 
Present Imperfect Preterite Future 
Iso ame ama(se/ra) amare 
2sG ames ama(se/ra)s amares 
3sG ame ama(se/ra) amare 
]PL  amemos amá(se/ra)mos amáremos 
2PL  ameis ama(se/ra)is amareis 
3pL amen ama(se/ra)n amaren 
Imperative 
2SG ama 
2PL amad 
Infinitive Gerund Participle 
amar amando amado 
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and third person singular and the third person plural exponents of the second 
and third classes are the same, while these forms are different for the first class. 
The three classes are only clearly distinguished in the first and second person 
plural. There are no shared exponents between class 1 and one of the other two 
classes to the exclusion of the remaining class. 


Table 8.2: Simple present paradigm of Spanish regular verbs 


Person/Number  cant-ar to sing? corr-er ‘to run? aburr-ir ‘to bore’ 


1sG cant-o corr-o aburr-o 

2sG cant-as corr-es aburr-es 
3sc cant-a corr-e aburr-e 

1PL cant-amos corr-emos aburr-imos 
2PL cant- dais corré-is aburr-ís 
3PL cant-an corr-en aburr-en 
participle cant-ado corr-ido aburr-ido 
gerund cant-ando corr-iendo aburr-iendo 


Some alternative descriptions of the Spanish system have been proposed be- 
fore. Boyé & Cabredo Hofherr (2006) suggest that thematic vowels seem to be 
a property of stems rather than verbs themselves. The authors base this claim 
on the fact that some irregular verbs show signs of having a different thematic 
vowel in certain stems: andar ‘to go, walk’ - anduve (1sg preterite) and anduviste 
(2sg preterite). This might very well be the case, but it is a very rare phenomenon 
in Spanish, and it is currently eroding for andar, with speakers using the more 
regular forms: andé and andaste. I will exclusively focus on the infinitive stem of 
the verb, and its changes for the present singular, past participle and gerund. For 
these cells, even a verb like andar uses the same stem: ando, andado, andando, 
respectively. I will keep the traditional view of the Spanish system of having 
thematic vowels being a property of lexemes, and three main inflection classes 
based on said thematic vowels. 

It should be clear, however, that three classes are insufficient to fully describe 
the inflectional behaviour of Spanish verbs. The main reason is that many verbs 
exhibit semi-regular conjugation patterns (some authors classify all these pat- 
terns under the umbrella of irregular Brovetto & Ullman 2005, but this kind of 
approach completely ignores that there are partial regularities within the differ- 
ent inflectional patterns Maiden 2001; 2005). The main process responsible for 
the minor conjugation patterns is diphthongization, but there are other stem 
changing processes. A few examples of different patterns found in the Spanish 
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verbal paradigm are presented in Table 8.3.? In this table shows how the three 
principal parts of the Spanish system (first person singular, past participle and 
gerund) are taken to determine the inflection class of all regular and semi-regular 
verbs (cases like ir 'to go' are completely irregular and their inflection cannot be 
determined by their principal parts). 


Table 8.3: Minor conjugation patters of Spanish verbs 


verb gloss pattern sc participle gerund 

escribir write /b~t/ escribo escrito escribiendo 
elegir choose /e~i/ elijo elejido eligiendo 
controvertir controvert ` /e-je/ controvierto controvertido  controvirtiendo 
descomponer decompose  /g/ descompongo descompuesto descomponiendo 
contraer contract /ig/ contraigo contraido contrayendo 
adquirir acquire li-je/ adquiero adquirido adquiriendo 
fluir flow /j/ fluyo fluido fluyendo 
aprobar approve /o~we/  apruebo aprobado aprobando 
jugar play /u~ue/ | juego jugando jugado 
humedecer humidify /0-0k/ | humedezco humedecido humedeciendo 


There are three macroclasses of verbal inflection: ar, ir and er, responsible for 
the inflectional endings, and multiple other minor (stem) patterns responsible for 
stem alternations in certain cells of the paradigm. The exact number of classes 
depends on how one classifies them and groups them. Mateo & Sastre (1995) find 
around 90 classes, but many of these are verb-specific. I take a more conserva- 
tive approach where I only take into account classes with more than one lexeme. 
Although different partitions of the stem patterns are possible, I will focus exclu- 
sively on those shown in Table 8.3. 

An important point here is that many of the stem alternation classes in Ta- 
ble 8.3 can also apply to nouns and adjectives: cuento ‘tale’, vejez ‘old age’, viejo 
‘old’, poblado ‘populated’, población ‘population’, pueblo ‘town’. Although I will 
only focus on verbs, the same hierarchy could be used for modelling stem alter- 
nations in nouns and adjectives. This is further evidence for the independence of 
thematic vowels from stem alternations. 


8.1.2 Previous takes on the Spanish verbal system 


Some older studies on the phenomenon of Spanish verbal inflectional classes 
considered the stem patterns to be the product of a sort of irregular or non- 


?Notice that the actual realization of j depends on the dialect. Also, in American Spanish, the 
/0/ would be an /s/, but the pattern remains the same. 
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systematic inflection triggered by diacritics/features on the relevant verbs (Foley 
1965; Brame & Bordelois 1973; Harris 1969; 1978) or by complex representations 
of the lexical entries which include the possible alternants a verb can exhibit 
(Hooper 1976). These analyses are phonological in nature, and assume a homoge- 
neous morphological system. Brame & Bordelois (1973: 43) also claim that “it is 
impossible to predict whether any of these segments will alternate or not” and 
thus suggest hard-wiring whether a noun or verb will alternate or not. 

Some recent approaches from a DM perspective (Arregi 2000), and an au- 
tosegmental OT perspective (Roca 2010), seem to make the same assumption 
that “[c]onjugation class membership is unpredictable” (Roca 2010: 412). Simi- 
larly, Bermüdez-Otero (2013: 3), talking about diphthongization in verbs, nouns 
and adjectives also claims that "[t]he choice of theme vowels in Spanish nouns 
and adjectives can be predicted neither from the phonological shape of roots nor 
from syntactic features like gender". He concludes that verbs are stored with their 
thematic vowel instead of having additional inflectional information. 

Spanish verbal inflection has also been used in the debate between a dual and 
single route approache to morphological processing and acquisition (Brovetto & 
Ullman 2005; Clahsen et al. 2002; Costanzo 2011; Eddington 2009; Yaden 2003), 
language change (Galván Torres 2007; Wanner 2006), as well as to test different 
computational models of analogy (Albright 2009). Most of these studies focus on 
the nature of psycholinguistic processing and mental representations, but I will 
not focus on these issues (for a detailed review of the literature on the topic of 
mental representation of Spanish verbal inflection, see Eddington 2009). 

There are multiple accounts of the diphthongization processes as shown Ta- 
ble 8.3 from a synchronic (Bellido 1986; Carreira 1991; Harris 1985; Kikuchi 1997) 
and diachronic (Wilkinson 1971) perspective, but these deal almost exclusively 
with the phonological process itself, and do not actually discuss which verbs un- 
dergo the diphthongization process. Additionally, most of these accounts focus 
on the vocalic changes and ignore consonant alternations. Regarding possible 
regularities that might predict these patterns, Roca (2010) claims that: 


[...] contemporarily, diphthongization is lexically conditioned, non-diph- 
thongising e, o being plentiful: cf. vejár ~ vejo ‘to ~ I slight’, podar ~ podo 
‘to ~ I prune’, etc. Albright et al. (2001) report a number of frequency effects 
associated with contextual segmental correlations, but minimal pairs like 
muelo ‘I grind’ vs. molo ‘I am/look cool’, respectively from moler, molar, 
or puedo ‘I can’ vs. podo ‘I prune’, from poder, podar, confirm the unpre- 
dictability of lexical incidence. Note that conjugation class is also irrele- 
vant: vuelo ‘I fly’, ruedo ‘I roll’, from 1% conj volar, rodar (Roca 2010: 423) 
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But the author confuses two things in a slightly disingenous way. First, the 
minimal pairs for podar ~ poder and molar ~ moler look alike in their stem but 
belong to two different classes, while volar and rodar belong to the same class 
but do not look alike. The first example shows that major inflection class mem- 
bership is not fully determined by the shape of the stem, but does not show that 
diphthongization is not predictable within classes. 

In a similar vain, Harris (1985) claims that: 


[a]s has long been recognised [...] segmental phonological and morpholog- 
ical conditions do not suffice to predict the occurrence or non-occurrence 
of diphthongization. It follows that some otherwise unmotivated property 
of the representation - i.e. a lexical diacritic - must be employed to distin- 
guish the alternating from the non-alternating cases, regardless of whether 
vowels or diphthongs are taken to underlie the alternation (Harris 1985: 32) 


However, Harris fails to provide any kind of evidence for the unpredictability 
of diphthongization. 

A study by Eddington (1996) deals with the degrees to which different deriva- 
tional processes make use of these diphthongs, but the author also claims that 
"of course, since not all mid-vowels are subject to diphthongization, those which 
are must be so designated by means of a diacritic or some other formal entity" 
(p. 9). 

The first hints at an analogical relation holding between these stem alternation 
patterns, and specifically the diphthongization, was reported by Malkiel (1966). 
The author noted that ie tends to be changed to i in the presence of an s com- 
bined with an r or v. Malkiel does not present a full analogical model for all 
conjugation patterns, though. A more elaborate model was proposed by Boyé & 
Cabredo Hofherr (2004), who observe that the thematic vowel and vocalic alter- 
nation of the stem is predictable, to some degree, from the prethematic vowel. 
The authors do not, however, provide a full model capable of accurately predict- 
ing inflection class. In their conclusion, they claim that the difference between 
-ir and -er is due to vocalic harmony, and both suffixes are really allomorphs of 
the same subjacent morpheme (p. 259). 

The main work that deals with analogy in the Spanish system comes from 
studies by Albright (Albright et al. 2001; Albright 2008b; 2009). Albright (2008b) 
shows that -er verbs have no high vowels in their stem, and verbs in ir tend not 
to have the vowel /o/. He also shows that the rates of the types of vocalic changes 
are heavily conditioned by the main inflection class. But Albright (2008b: 3) still 
claims that "the choice of diphthongization vs. raising is not predictable". One 
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important point Albright (2008b) makes is that speakers seem to keep generaliza- 
tions about verbs with regards to stem patterns internal to their main inflection 
class. That is, an -ar verb will not analogize to -ir or -er verbs. I will test this 
conclusion with the models below. 

The most recent work on analogy in Spanish inflection is presented by Albright 
(2009). In this paper the author shows how a minimal generalization learner (Al- 
bright & Hayes 2002) can predict whether a verbal stem in Spanish would un- 
dergo diphthongization or not. As it was described before, a minimal general- 
ization learner finds regular and semi-regular patterns, similar to schemas, that 
predict class membership, and weights them according to how frequent and how 
general or how specific these patterns are. 

One of the main claims by Albright (2009) is that structural analogy is more 
predictive than pure surface similarity. This claim is tested against psycholinguis- 
tics data. Albright et al. (2001) tested 96 native Spanish speakers on new possible 
verbs (wugs) to see the rate of diphthongization these would have. Speakers were 
asked to produce the inflected forms of 33 wug items containing a mid vowel (e.g. 
lerrar). The analogical model proposed by Albright (2009) reached a correlation 
coefficient, r, of 0.77 when compared to experimental data. Additionally, Albright 
(2009) tested a less structured model, one that only takes into account surface sim- 
ilarity without structural similarity. The unstructured model reached an r of 0.56, 
clearly showing that the minimal generalization learner has better performance 
when predicting speaker’s behaviour. However, Albright (2009) only tackles the 
binary distinction: diphthong vs no diphthong. There is no attempt at modelling 
all inflectional patterns, or a significant subset of these. There are no previous 
attempts at modelling the full Spanish inflectional system with analogy. 


8.1.3 Modelling the system 


We need a way of classifying and relating stems to major inflection patterns for 
Spanish verbs. A simple alternative to capture the fact that the er and ir classes 
behave as a single class in opposition to the ar class, is with a hierarchy as in 
Figure 8.1. 

But this model is insufficient if we also want to capture semi-regular patterns. 
Table 8.4? presents the cross-tabulated distribution of stem and major patterns, 
as they appear in a list of around 3000 Spanish verbs (see below for the data 
description). From this table it is clear that there is no obvious systematicity to 


3] use the letter L to indicate the j class, and the letter z to mark the /0/ sound (as it is the norm 
in Spanish). 
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v-classes 


IN 


ar non-ar 


A 


er ir 


Figure 8.1: Basic hierarchy for Spanish theme vowels 


the patterns*. To which patterns a verb belongs has to be specified independently. 


Table 8.4: Number of verbs by pattern and thematic vowel in a sample 
of 3054 Spanish verbs 


Thematic vowel 


a e i 

b-t 0 0 9 
e-1 0 0 23 
e-ie 65 17 32 
g 0 31 1 
ig 0 u 0 
i-ie 0 0 2 
i~iet 0 0 6 
suppletion 1 9 10 
L (5) 0 0 31 
o~ue 51 22 2 
non-alternating 2409 79 143 
u~ue 1 0 0 
z (/0/)-zc 0 73 16 


Boyé & Cabredo Hofherr (2006) suggest that the analysis of verbal inflection 
in Spanish should make use of the stem space (Bonami & Boyé 2003), that is, a 
list of stems that cover all cells in the paradigm of a verb: “lexemes should rather 
be associated with a vector of possibly different phonological representations" 
(Bonami & Boyé 2006). This stem space partitions the paradigm in a regular way, 


‘Here, non-alternating stands for verbs with no special pattern, and suppletion for some verbs 
with patterns that only apply to them, stem suppletion (Boyé & Cabredo Hofherr 2006), and 
verbs derived from these (e.g. decir ‘to say’, and bendecir ‘to bless’). 
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and it is a morphomic property of the paradigm. Boyé & Cabredo Hofherr (2006) 
show, as Maiden (2001) before, how certain tenses, with no apparent semantic 
connection, use the same stem (the authors identify eleven stems in total, p. 6). 
This proposal makes sense for the system. These patterns only affect the stems, 
and are independent of the thematic vowel of the verb. The implication is then 
that there is an independent hierarchy which captures the stem alternation sys- 
tem. 

There are many ways to capture the patterns in Table 8.3, especially because 
this is not a complete list. Depending on what one considers to be an inflectional 
pattern the list can be much longer (some lists mention up to 101 verbal pat- 
terns).? If we only focus on the patterns listed here the basic type hierarchy as 
in Figure 8.2% is sufficient. 


stem-space 


T EIN 


non-alternating alternating 


DEC Ure 


suppletion  stem-pattern 


"u— cb e 


vocalic non-vocalic 


"c cd M d c 


ei ele Je o~we  u-we d/b-t g ig j  0-0k 
Figure 8.2: Hierarchy for Spanish verb stem alternations 


Notice that there it is not necessary to list the specific position for the phono- 
logical process in the case of diphthongization because this process necessarily 
applies to the stressed syllable, except when the item appears with a derivational 
suffix that attracts stress like the diminutive -ito (poblar ~ pueblo ~ pueblito, ‘to 
populate’, ‘town’, ‘small town’) (Carreira 1991). 

Combining the hierarchies in Figure 8.2 and Figure 8.1 produces a cross- 
classification as in Figure 8.3. Notice that in this hierachy, the classes theme- 
vowel and stem-space refer to two different kinds of processes, or aspects of verb 
inflection that interact with each other. 


Shttp://www.verbolog.com/conjuga.htm, visited 20.10.2016. 
*The use of an irregular type is not really needed, however. Completely irregular verbs can be 
modelled by using lexical entries with a fully specified, and irregular, stem space. 
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Additional evidence for postulating cross-classification of two independent 
hierarchies comes from two observations. First, as mentioned before, some of 
the stem alternations are not exclusively restricted to verbs, but can also appear 
in nouns: dental ~ diente ‘dental’, ‘tooth’, pernil ~ pierna (animal) leg’, ‘leg’, molar 
~ muela ‘back tooth’, etc (Carreira 1991). Second, the case of poner ‘put’ suggests 
that cross-classification can also occur within the stem alternation hierarchy, as 
it would belong to both types /g/ (1sc pongo) and /o-we/ (PP puesto). For the 
purposes of this study I will ignore these interactions due to their sparsity (see 
Fondow 2010 for a historical take on this particular class of verbs in Spanish). 

The hierarchy concerning the thematic vowel in Figure 8.1 can be said to only 
be relevant for the actual endings in the inflected forms, but not so much for 
the stem alternations, besides specifying that -ar verbs do not seem to exhibit 
any non-vocalic stem alternation. At this point we cannot tell whether this is an 
accidental gap or a fact we should hardwire into the grammar. In contrast, the 
hierarchy in Figure 8.2 is about the stem alternations found in the different verbs. 

Although Boyé & Cabredo Hofherr (2006) argue for the need of eleven stems 
for the Spanish paradigm, I will only focus here on the stems for the principal 
parts of verbs, since the other stems can be easily integrated into this system. I 
use a simplified stem specification as in (1) for Spanish verbs. 


(1) SLOTÍ 
STEMS | SLOT2 
SLOT3 


In (1) SLOT1 is the stem of the 1sg present, SLOT2 is the stem of the past participle, 
and sLoT3 is the stem of the gerund. With this, a regular verb like amar ‘to love’ 
would have a stem specification as in (2), but a completely irregular verb like ir 
'to go' would have a stem specification as in (3). 


(2) SLOTI am 
STEMS | SLOT2 am 


SLOT3 am 


(3) SLOT1 voy 
STEMS | SLOT2 ido 
SLOT3 y 


As pointed out before, however, the stem alternations of most verbs are not un- 
systematic, and we would like to capture these patterns. Additionally, we would 
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like to avoid directional implicational relations, where one stem is used to derive 
all other stems, thus giving it some special status. I present here a very simple 
sketch that aims to achieve this. The point is to define the stem alternation types 
as constraints on the alternations seen for a verb of such a type. So, for the type 
b~t’, we have a constraint as in (4) where the co-indexed boxes indicate string 


identity. 
(4) sLOTI1 [1]-b- 
b/d-t = | STEMS | SLOT2 [1]+t- 
SLOT3 [1]-b- 


Similar constraints for all other alternations presented in Table 8.3 can be de- 
fined. Some examples are shown in (5) and (6). 


(5) sLoT1 libi«Bl- 


N 
T 


e-i — | STEMS | SLOT2 |1]+e+ 


SLOT3 |1-i«[2.- 


(6) SLOTI [1/+je+[2} 


= 

+ 

O 

+ 

N 
T 


e-je = | STEMS | SLOT2 


sLOT3 [1]+i+[2)- 


The g pattern is only present in the verbs poner ‘to put, venir ‘to come’, tener 
‘to have’, valer ‘to be worth’, and salir ‘to leave, exit and all their derivatives. 
In the case of poner shows that there is additional cross-classification with o-we: 
puesto. These can be seen in (7) and (8). 


(7) SLOTÍ 


ug: 
£ = | STEMS SLOT2 |1!- 
SLOT3 [1+ 
(8) SLOTI |1 +we+|2|- 


0-we = | STEMS | SLOT2 |1}+0+/2|- 


SLOT3 [1]+0+[2}+ 


"Using actual phonological specifications is, of course, possible. I use orthography for simplicity, 
and because in the case of Spanish the orthographic representation does not hide important 
aspects of the morphology. 
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The pattern /ig/ is restricted to verbs ending in /a/ that belong to the -er conju- 
gation: traer, caer and derivatives. At first sight, one could think this is a simple 
exception, but any new verb with this shape would also take this stem pattern. If 
given a wug verb like saer, the 1sg form would be saigo. The analogical constraint 
that specifies this pattern is simple enough: /a#/, but the complexity in the ana- 
logical specification is a matter of degree. The more productive cases are only 
partially specified, and this is precisely what makes them more productive (they 
have fewer restrictions on the shape of the stems that can appear with them). 
This constraint is shown in (9). 


(9) sLOT1 [I}+ig- 
ig = | STEMS | SLOT2 [1}+1- 
SLOT3 [1)*j- 


The /i-je/ pattern is also very limited, only appearing in my corpus with in- 
quirir ‘to inquire’ and adquirir ‘to acquire’. Notice that in this case -quirir is not 
a verb, so neither verb is a derived form in itself, despite the presence of the 
prefixes in- and ad-. As with /ig/ before, any new verb that would take the form 
-quir- in its stem, would also inflect by the /i-je/ pattern: sanquirir - sanquiero. A 
structure like the one presented in (10) captures this pattern. 


(10) SLOT [1]+(k)+je+(r)- 
i-je > | STEMS | sLOT2 [1]+(k)+i+(1)- 


un 


SLOT3 [1)+(k)+i+(1)- 


I mark in parentheses the segments which will necessarily appear in the stem 
for clarity, but the constraint in (10) does not need to specify them. One might 
be tempted to suggest that these extremely restrictive patterns should specify 
their restrictions directly on the lexical items themselves. This, however, would 
be missing out on the fact that these very restrictive patterns are just an extreme 
case of the more productive patterns. This is easily captured by using the ana- 
logical/form similarity function that licenses items being in particular types. For 
example, the difference between regular ir class verbs and i-je verbs is that reg- 
ular ir class verbs have fewer formal restrictions than i-je verbs. 

As stated before, these are simply sketches, and a more formal analysis could 
probably split these patterns into more basic processes, or collapse others based 
on more general phonological specifications. The important point here is that 
the definition of the minor patterns can be done in a way that is independent 
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of whatever the major pattern of the stem is. This way the interaction between 
both types becomes straightforward. I will argue that the experimental results 
strongly support the observation that major and minor patterns are mostly inde- 
pendent of each other. 


8.1.4 Materials 


For this section I first extracted all verbs from a Spanish frequency list based on 
subtitle corpora.? From this list I extracted all lemmas using TreeTagger (Schmid 
1995). This produced a list of 4271 lemmas, from which I removed all reflexive 
forms, verbs without complete conjugation paradigms, and verbs whose stem is 
too short to play a role in an analogical model (e.g. ir). The final list was comprised 
by 3052 verb lemmas, for which I produced all three principal parts. 

Extracting the stem of the verbs was relatively easy in this case, because we 
define the infinitive stem as the verb minus the thematic vowel and final r. Ad- 
ditionally, to control for orthography I replaced all letter pairs that represent a 
single phoneme with a single symbol (e.g. ch — C, Il — L, etc.). Because of the 
imbalance seen in the proportion of ar verbs vs all other verbs, I left only in the 
dataset the 300 most frequent ar verbs, which produced a 808-verb dataset”. I 
present side by side statistical results from the smaller dataset and the complete 
dataset, but focus on the distributions obtained with the smaller dataset. 


8.1.5 Results 


There are three interesting models to look at. First, we test how well our analog- 
ical model can predict the thematic vowel of the verb. This is the basic model, 
which should basically capture insights mentioned before (Boyé & Cabredo 
Hofherr 2004). The second model should predict the minor patterns. Finally, the 
third model will deal with the combination of both dimensions, giving us a the 
full predictions of verb inflection classes. 

We start with the model predicting the major inflection pattern. This model 
only looks at the final three segments of the stems thematic vowel ~ final.1 + 


Found at: https://invokeit.wordpress.com/frequency-word-lists/, visited 8-11-2016. 

"lt is worth mentioning here that leaving all verbs in the dataset did not produce significantly 
worse results in the models, but did introduce a confound when interpreting the role of ar- 
non-alternating. The accuracy metrics used are somewhat sensitive to these imbalances, and 
the accuracy of a model will be very high if the model always predicts the most frequent class. 
This sometimes makes models over-generalize towards the more frequent class and ignore 
patterns in the less frequent classes. Ultimately this is a weakness of the models I am using 
which could possibly be overcome with a different approach. 
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final.2 + final.3.1 The results are presented in Table 8.5, and the correspond- 
ing statistics in Table 8.7. 


Table 8.5: Confusion matrix for the model predicting thematic vowel 


of Spanish verbs 
Reference 
Prediction ar er ir 
ar 302 19 42 
er 25 208 9 


ir 51 7 225 


Table 8.6: Confusion matrix for the model predicting thematic vowel 
of Spanish verbs with full dataset 


Reference 
Prediction ar er ir 
ar 2400 48 118 
er 37 182 3 


ir 89 3 154 


First of all, the model has a very high accuracy and kappa score. It is clear 
that the prediction of the thematic vowel is possible from the stem of the verb. 
Somewhat worrying, however, is that the confusion between the three classes 
does not follow the predictions made by the hierarchy in Figure 8.1. In the model 
er and ir show less confusion with each other than with ar. This seems to go 
against the hierarchy proposed to model their morphological asymmetries. Just 
looking at this case it appears as a strong counter example for the thesis of this 
book. However, if instead of measuring the distance based on the errors made by 
the model, we measure this distance directly on the probability matrix, the result 
is very different. The distance matrices can be seen in Table 8.11 and Table 8.12. 
In the reduced dataset the distances are pretty much the same between the three 
classes (with minor variations), while in the complete dataset there is a strong 
effect in the expected direction, that is, class—er is closer to class-ir. The problem 


The model had eight hidden nodes, and a decay rate of 0.09. There was no noticeable improve- 
ment from using more structured predictors. 
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Table 8.7: Statistics for Table 8.5 


Overall Statistics 


Accuracy : 0.8277 
95% CI : (0.8012, 0.852) 
No Information Rate : 0.4257 
Kappa : 0.737 


Statistics by Class: 
Class: ar Class: er Class: ir 


Sensitivity 0.799 0.889 0.815 


Specificity 0.880 0.948 0.905 
Neg Pred Value 0.854 0.957 0.904 
Balanced Accuracy 0.839 0.919 0.860 


Table 8.8: Statistics for Table 8.6 


Overall Statistics 


Accuracy : 0.9019 
95% CI : (0.8906, 0.9121) 
No Information Rate : 0.8326 
Kappa : 0.6528 


Statistics by Class: 


Class: ar Class: er Class: ir 

Sensitivity 0.950 0.781 0.560 
Specificity 0.673 0.985 0.966 
Neg Pred Value 0.731 0.981 0.956 
Balanced Accuracy 0.812 0.883 0.763 


Table 8.9: Distance Matrix for Table 8.5. 


ar er 


er 2.25 
ir 121 2.89 
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Table 8.10: Distance Matrix for Table 8.6 


ar er 


er 2.35 
ir 106 2.92 


here is that this effect is caused by the frequency imbalance between the classes. 
Because class—ar has so many more members that are correctly predicted, the 
overall distance of this class from the other two increases. At best this particular 
case remains inconclusive. 


Table 8.11: Distance Matrix on probabilities for the reduced dataset 


ar er 
er 2.12 


ir 2.05 2.19 


Table 8.12: Distance Matrix on probabilities for the complete dataset 


ar er 
er 2.46 
ir 241 1.55 


Next, we try to predict the minor inflectional pattern only. We fit the same 
model as before: minor pattern ~ final.1 + final.2 + final.3.! The results 
are shown in Table 8.13 and Table 8.14 (the overall results for the full dataset are 
in Table 8.15). 


"With eight hidden nodes and a decay rate of 0.01. 
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Table 8.13: Confusion matrix for the model predicting minor inflection 
patterns of Spanish verbs 


Reference 
Prediction b~t ei e-ie g ig i-ie i~iet L o~ue non-alt. z-zc u-ue 
b-t 9 0 0 0 0 0 0 0 0 0 0 0 
e-i 0 13 0 0 0 0 0 1 0 9 0 0 
e-ie 0 0 31 0 0 0 0 0 0 8 1 0 
g 0 0 1 40 0 0 0 0 3 4 0 0 
ig 0 0 0 0 1 0 0 0 0 0 0 0 
i-ie 0 0 0 0 0 2 0 0 0 0 0 0 
i-iet 0 0 0 0 0 0 6 0 0 0 0 0 
L 0 0 0 0 0 0 0 28 0 3 0 0 
o~ue 0 0 0 0 0 0 0 0 31 11 0 0 
non-alt. 0 10 28 2 0 0 0 2 8 452 3 1 
Z~ZC 0 1 0 0 0 0 0 0 3 85 0 
u-ue 0 0 0 0 0 0 0 0 0 1 0 0 


Once again, the model has a good accuracy in predicting these minor patterns, 
even those claimed to be unpredictable. This is not too surprising given the pre- 
vious studies that have already found strong phonological regularities that cor- 
relate with diphthongization. Some of the consonant patterns are in fact (almost) 
fully predictable by simple rules. Most verbs ending in /n/ are of class-g, while 
all verbs that end in /a/ are of class-ig. This is interesting because it means that 
this particular tree is a mix of fully and partially predictable classes, which lends 
support to the claim that the filter that assigns stems to types can go from a fixed 
simple constraint to a more complex pattern. Finally, non-alternating is indeed 
the default class, with the lowest negative predictive value. Remember that the 
negative predictive value represents how many false positives are in a given class. 
The class with the lowest negative predictive value is the class where most errors 
from other classes are grouped. Whenever the model does not know what class 
an item should be assigned to, it assigns it to the default class. 

For the last case we try to predict the complete conjugation of the verb (i.e. the 
thematic vowel and minor inflection pattern together). The model is once more 
the same: conjugation ~ final.1 + final.2 + final.3'. The corresponding 
heat map is shown in Figure 8.4, and the corresponding statistics in Table 8.16. 

These results show that ar-non-alternating is still the class with lowest nega- 


With eight hidden nodes and a decay rate of 0.01. 
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Table 8.14: Overall and by class statistics for Table 8.13 


Overall Statistics 


Accuracy : 0.8762 


95% CI : (0.8515, 0.8982) 
No Information Rate : 0.6077 


Sensitivity 
Specificity 

Neg Pred Value 
Balanced Accuracy 


Sensitivity 
Specificity 

Neg Pred Value 
Balanced Accuracy 


Sensitivity 
Specificity 

Neg Pred Value 
Balanced Accuracy 


Class: b-t 


Kappa : 0.792 
Statistics by Class: 

Class: e-i 

1.000 0.565 
1.000 0.987 
1.000 0.987 
1.000 0.776 
Class: ig Class: u~ue 
1.000 0.000 
1.000 0.999 
1.000 0.999 
1.000 0.499 
Class:L Class: o~ue 
0.903 0.738 
0.996 0.986 
0.996 0.986 
0.950 0.862 


Class: e~ie 
0.508 
0.988 
0.961 
0.748 

Class: i~ie 
1.000 
1.000 
1.000 
1.000 

Class: non-alt 
0.921 
0.830 
0.870 
0.875 


Class: g 
0.952 
0.990 
0.997 
0.971 

Class: i~iet 
1.000 
1.000 
1.000 
1.000 
Class: z~zc 
0.955 
0.994 
0.994 
0.975 


Table 8.15: Overall and by class statistics for model predicting minor 
patterns on the full dataset 


Overall Statistics 


Accuracy : 0.9268 
95% CI : (0.917, 0.9358) 


Kappa : 0.6888 


No Information Rate : 0.8672 
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SqIOA UST 
-ueds jo seet uorjoogur Sunorpard [opour əy} 103 dew eH pg INL] 
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Tul 


ET 


EE 


22-Z2--J9 


hejnbal--1a 
9n-0o-49 
61-19 
5-49 
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Table 8.16: Overall and by class statistics for Figure 8.4 


Overall Statistics 


95% CI : (0.7469, 0.8055) 


Accuracy : 0.7772 


No Information Rate : 0.3329 


Kappa : 0.7313 


Statistics by Class: 


Class:ar-e-ie Class: ar-non-alt Class: er-e-ie Class: er-g 

Sensitivity 0.2500 0.7695 0.0588 0.9355 
Specificity 0.9975 0.8794 0.9924 0.9961 
Neg Pred Value 0.9888 0.8843 0.9800 0.9974 
Balanced Accuracy 0.6237 0.8245 0.5256 0.9658 
Class: er-ig Class:er-o-ue Class: er-non-alt Class: er-z-zc 

Sensitivity 1.0000 0.6818 0.7595 0.9589 
Specificity 1.0000 0.9924 0.9561 0.9918 
Neg Pred Value 1.0000 0.9911 0.9735 0.9959 
Balanced Accuracy 1.0000 0.8371 0.8578 0.9754 
Class: ir-e-i Class: ir-e-ie Class: ir-g Class: ir-i-iet 

Sensitivity 0.4348 0.6562 0.9091 1.0000 
Specificity 0.9898 0.9910 0.9912 0.9975 
Neg Pred Value 0.9835 0.9859 0.9987 1.0000 
Balanced Accuracy 0.7123 0.8236 0.9502 0.9988 
Class:ir-L Class: ir-non-alt Class:ir-z-zc Class: ar-o-ue 

Sensitivity 0.9355 0.8462 1.0000 0.5000 
Specificity 0.9974 0.9639 0.9987 0.9899 
Neg Pred Value 0.9974 0.9668 1.0000 0.9886 
Balanced Accuracy 0.9665 0.9050 0.9994 0.7449 
Class: ir-b-t Class: ir-i-ie Class:ir-o-ue Class: ar-u-ue 

Sensitivity 1.0000 0.5000 0.0000 0.0000 
Specificity 0.9987 1.0000 1.0000 1.0000 
Neg Pred Value 1.0000 0.9988 0.9975 0.9988 
Balanced Accuracy 0.9994 0.7500 0.5000 0.5000 


tive predictive value, which means it is the default class for our model, as pre- 
dicted. Most ofthe other classes are relatively more or less predictable, with some 
diphthongization classes having little predictability, like ir-o-ue and ar-u-ue. 
These are, however, extremely infrequent, with 2 and 1 frequency counts, respec- 
tively. It is not surprising that such low-frequency classes should be hard or im- 
possible to predict. It is also expected that combining both dimensions causes 
some classes to have low predictability. After all, we use the same three predic- 
tors to predict sixteen classes, instead of the three and eight from before. The 
validation results of this final model are presented in Figure 8.5. 

The results of the MDS and clustering are shown in Figure 8.6. These clusters 
exhibit several interesting properties. First, the types ar-non-alternating, er-non- 


155 


8 Complex inflectional classes 


accuracy m kappa 


additive subtractive 


0.75 - 1 


000—--- 


baseline - 
final.1 
final.2 - 
final.3 

baseline - 
final.1 
final.2 - 
final.3 


factor 


Figure 8.5: Overall validation for the model predicting inflection class 
of Spanish verbs 


alternating, and ir-non-alternating are all three in the corners of the space. These 
are maximally different from each other. The color clustering seems less insight- 
fulin this case than the MDS, but some groups do form nicely. The least insightful 
cluster is probably the lila one in the lower left quadrant with the patterns ir-b-t 
and ir-z-zc, and directly besides this one (in light orange) the alternations er-ig 
and ir-L. These two clusters do not seem to follow any pattern, but then again, 
there is little organization to them. In red we have a clear cluster of ir-g and er-g, 
and in dark blue we see a similar situation with the cluster ar-o-ue and er-o- ue. 
These two clusters organize according to the stem patterns, and not according to 
the thematic vowels. The class ir—o~ue is very close in the plane to the other two 
0- ue patterns, but by the hierarchical clustering analysis grouped together with 
the ir alternations ir-i- ie and ir-i- iet. In this case the thematic vowel seems to 
be more important for the organization of these three patterns. In light blue we 
have the classes ir-a- ie, ir-e-i and ir-e- ie. Here we see again three classes that 
basically cluster around stem patterns. 

The clusters are by no means perfect, but they do match the proposed hierar- 
chy to some extent: there are three major inflection patterns that correspond to 
the thematic vowel, and there are some minor conjugation patterns that cross- 
classify with these. 

Some of these effects seem to contradict the claim by Albright (2009) that ana- 
logical effects are local to the three major classes. These results show that ana- 
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Figure 8.6: Multidimensional scaling with hierarchical clustering for 
label colors 
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logical effects between minor patterns run across these three main classes. Al- 
though there is no clear explanation for why some clusters prefer to form around 
the thematic vowel, while others group around the stem patterns, it seems clear 
that there must be analogical relations that run through both subtrees of the in- 
flection hierarchy of Spanish verbs. In the model I propose here, all dimensions 
of the hierarchy can carry some analogical information. However, which dimen- 
sions will matter most, or where the strongest similarities will be found, cannot 
be determined by any particular property of the hierarchy. 

For Spanish, it is also interesting to compare the model to the experimental 
results of Albright (2009) mentioned above. As already described, in the origi- 
nal experiment, Albright et al. (2001) tested 96 native Spanish speakers on wugs 
to see whether these wugs would be prone to diphthongization or not. The au- 
thor used 33 wugs with forms like lerrar. Speakers were presented with the verb 
used in a non-alternating context, like the first person plural (lerramos), and then 
asked to fill in a dialog were the wug appeared in non-alternating and alternating 
contexts. The authors then calculated the probability of a wug diphthongizing as: 
the number of speakers who produced a diphthongized form for said wug, over 
the total number of speakers. 

Since we are now predicting experimental data, we can use the complete 
dataset (with 3000 verbs) without doing any splitting. As the experimental 
dataset only contains information about mid vowel diphthongs, we have to fit 
a model trained to predict only this factor. In this case, the previous formula 
for fitting the model did not perform as well. A more structurally defined model 
did a much better job: diphthong ~ final.1 + final.2 + pre-theme vowel * 
theme vowel + n clusters). 

This model also takes the final and prefinal segments of the stem, but addition- 
ally identifies the pre-thematic vowel interacting with thematic vowel, and the 
number of consonant clusters!*. The reason for also adding the thematic vowel 
is simple. Albright presents a model trained exclusively on ar verbs. Adding the 
thematic vowel in this case means that the model knows what the main portion 
of the dataset it should look at is when making the predictions, but also has the 
rest of the dataset to learn from. This is important because our model is less ca- 
pable of making large phonological generalizations than Albright’s is, and every 
bit of data matters. 

When predicting the wugs, the model achieved a correlation of r = 0.59 (p < 
0.05), which is quite close to the generalized context model Albright (2009) re- 


Because we are now predicting probabilities, using the Linout linking function produces better 
results. The model had no hidden nodes and only a skip layer. 
MT take any two consonants appearing together to be a consonant cluster. 
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ports on (r = 0.56). It is, however, considerably below the minimal generaliza- 
tion learner (r = 0.77). The predicted probabilities in Figure 8.7 show where the 
problem lies. The analogical model has difficulties with some wugs ending in 
complex clusters. This is because these particular combinations are either not 
present in the data (etC is missing) or very rare (otr has a frequency of 1). This 
shows that the generalizations the model makes are too local, and not general 
enough to capture weird looking wugs correctly. Nevertheless, this is not a bad 
performance in the sense that the model seems to have some sort of correlate 
with speaker's intuitions, particularly regarding wugs that do look like observed 
words. Those cases where speakers were much less likely to allow for diphthongs 
are also completely disallowed by the model. 


T 


0.9- 


0.8- 


[sor] 


produced probabilities 


0.7- 


a erben 
[gembly 
0.6- 
0.6 0.7 


predicted probabilities 


Figure 8.7: Predicted vs. observed probabilities of diphthong stems 


The fact that the minimal generalization learner outperforms the analogical 
model means that the latter is a rougher approximation to what speakers ac- 
tually do than the former. It is likely that the analogical model better captures 
the regularities of the synchronic system, but fails to distinguish between truly 
productive patterns and unproductive patterns. On the other hand, a big down- 
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side of Albright's approach is that it only predicts one categorical distinction. 
In contrast, our model is capable of precise class assignment. Ultimately a more 
sophisticated system would be required to be able to perform both tasks well: 
simulate speaker's performance and fine class predictions. Finally, wugs ending 
in otr all get different empirical probabilities. This shows that the initial models 
that only consider the last three segments are missing something. 


8.2 Cross-classifications between plural and singular: 
Kasem 


Kasem is a Gur Language, of the Grusi branch, spoken mostly in Ghana and 
Burkina Faso (Naden 1988). Kasem featured prominently during the seventies 
and eighties in phonological debates (Phelps 1975; 1979; Halle 1978; de Haas 1987; 
1988) because of coalescence phenomena (see also Zaleska 2017). Like other Gur 
languages (Naden 1989), Kasem exhibits a complex system of genders and classes 
that has received relatively little attention in the literature (see Awedoba (2003) 
for some recent discussion of the Kasem gender system, and Niggli & Niggli 
(2007) for an electronic dictionary of Kasem). Kasem is traditionally analyzed as 
having 5 genders and 9 nominal classes: 


a class is considered singular if the majority of its members are singular, se- 
mantically and grammatically; and plural if the majority of its membership 
is grammatically and semantically plural. There are four singular classes 
and five plural classes. A pairing of a singular and a plural class constitutes 
a gender (Awedoba 2003: 4) 


Gender is defined with relation to the agreement of the noun with the de- 
terminer (most adjectives do not agree at all, and those which do have inher- 
ent markers). Awedoba (2003: 3) proposes the classification shown in Table 8.17 
(adapted from the original, and with additional information from Awedoba 1980 
and Awedoba 1996)P. I will show in the following sections that this approach is 
insufficient to properly capture the complexity of the Kasem system. Nonethe- 
less, the organization in Table 8.17 already gives us an idea of what the problem 
is (for work on the noun class systems of related languages see Brindle 2009, 
Bodomo 1994, Bodomo 1997, and Dakubu 1997): there are five genders based on 


agreement patterns with pronouns!?, and many more number markers that do 


BOther sources label the genders with letters from A to E (Callow 1965). 
l*The literature does not present any clear examples, but it is mentioned. 
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not correspond 1 to 1 with said genders. In a way, this is a similar situation to the 
Romanian system discussed in Chapter 5. 


Table 8.17: Gender and classes in Kasem 


Gender sg. classes pl. classes 


noun class marker Det noun class marker Det 


1 I ui a wom II a bam 
2 III i dun IV a yam 
3 V a kam VI i sun 
4 VII u kom VIII 0, du tum 
5 VII u kom IX 0, ni dun 


Awedoba (1980: 249) admits that the markers in Table 8.17 are only the ones he 
considers to be the most frequent in the language, and that there are other less 
frequent ones (I will present several additional markers in the following sections). 
Since the author does not provide an explicit list of all the markers and the gen- 
ders they define, and because gender assignment is defined by the combination 
of a noun's singular and plural markers, I will not focus on gender, but rather on 
the question of how number markers get assigned to nouns”. This question has 
been studied before. Some semantic regularities seem to be present in the gender 
assignment patterns. Gender 1 mostly contains human nouns. Gender 2 contains 
fruit names and body parts, among others. Gender 3 also contains body parts and 
names of fruits, but also animals, trees and other plants. Gender 4 seems to be 
the default class, and Gender 5 is claimed to only contain some 20 nouns mostly 
related to domestic items. The author concludes that 


Kasem Genders are not based on a grouping of homogeneous items. While 
a gender may contain items from several semantic categories, no gender 
can be said to monopolise absolutely nouns belonging to any one semantic 
category (Awedoba 2003: 7) 


A further complication for the semantic analysis is that stems can belong to 
multiple genders. So, while the term for a Kasem person kasino belongs to Gender 
1, the term for the language kasini belongs to Gender 2. Similarly, diminutives 
belong to Gender 1, even if the stem belongs to any of the other genders. 


There is also the more practical problem that the dictionary does not contain gender informa- 
tion. This means that gender can only be infered from the markers themselves. 


161 


8 Complex inflectional classes 


Some hints towards the possibility of formal analogical relations are already 
present, although not spelled out (notice that the author does not tell us what 
the underlying forms would be in the proposed examples): 


While the semantic bases of the genders cannot be denied, phonology does 
also play a role in the allocation of nouns to classes and genders [...] The fi- 
nal syllable of a noun, especially the quality of the final vowel, plays some 
role in the allocation of nouns to their genders. For example, although bugo 
‘river’, Gender 3 and buga ‘tiredness’, Gender 2 appear to be homophones 
they are assigned to different classes and genders not necessarily on se- 
mantic grounds but perhaps on account of their suffixes, which happen on 
the surface to be identical but not in deep structure (Awedoba 2003: 12-13) 


Similarly, Awedoba (1980: 250) had already observed (informally) that gender 
assignment for loan words in Kasem follows semantic and phonological analogy. 

Another important data point mentioned by Awedoba (2003: 13) (first found 
in Awedoba (1996)), but not discussed with relation to the analogical relations in 
the system, is the fact that noun-adjective compounds can have different genders 
independently of the head noun in the compound. So, while ka-balana (woman- 
small, ‘small woman’) belongs to Gender 3, ka-kamumul? (woman-big, big wo- 
man’) belongs to Gender 4. This indicates that the adjective assigns the gender of 
the compound, and not the head noun. This is interesting because it means that 
formal features can easily overcome semantic features in gender assignment in 
Kasem. 

Kasem also has a complex tone system. However, because the dictionary I am 
relying on (Niggli & Niggli 2007) only lists the tones for the singular form, and 
it is not clear what happens to those tones in many plurals (especially when the 
number of syllables of the singular and plural are different), I will not consider 
tone in this study. 


8.2.1 ATR in Kasem 


In Kasem, as in many West African languages, there is an alternation between 
[+ATR] (advanced tongue root: /u/, /i/, /a/, Zei, /o/) and [-ATR] (/o/, /V, /a/, /el, 
/9/) vowels (Casali 2008): /o-u/, A-i/, /a-a/, /e-e/, /5-0/. Words (with the exception 
of compounds), have the same [ATR] specification for all their vowels: 


See the following subsection for an explanation on ATR in Kasem. 
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In the simplest and most general case of ATR harmony, the vowels in any 
given word are either all [+ATR] or all [-ATR]. Thus, words in which some 
vowels are [-ATR] and others [-ATR] do not ordinarily (setting aside cer- 
tain common classes of exceptions) occur (Casali 2008: 496) 


This feature, as can be seen in (11),? creates minimal pairs and seems to be 

lexically specified. 

(11) singular plural gloss 
a. colo cwoolu ‘kilogram’ + 
b. colo cwaalo ‘girl that likes going out with men - 
c. peeli peelo ‘shovel, spade’ + 
d. peel peela ‘bean cake’ - 
f volu volo ‘traveller’ + 
e. valo vala ‘farmer’ - 
g. yiri yiro ‘type, kind’ + 
h. yu yra ‘name’ - 


There are, however, some cases in the dictionary where it is not completely 
clear whether we are dealing with exceptions to this rule or errors in the dictio- 
nary itself: 


(12) singular plural gloss 


a. tanti tantiə ‘aunt’ 
b. yukwala yukwalı ‘headscarf’ 
c. yukwolo yukwəli ‘small skull’ 


In (12) there is a supposedly impossible combination of /i/ and /a/, while in 
the other two examples /u/ appears with both /ə/ and /a/. It is recognized that 
in ATR harmonizing languages some words may fail to show any harmony, or 
only present partial harmony (Casali 2008), but it is hard to check any of the 
particular cases in the dictionary. 

For Kasem, it is claimed that the [+ATR] feature is carried by the root, and it 
then extends to the affix (Casali 2008: 501). It is mainly for this reason that I will 
not consider ATR as a predictor or predicted feature of Kasem noun classes. I do 
not claim that it does not play a role, but counting it in would make an already 


complex system even more complex.?? 


P All Kasem examples are taken from Niggli & Niggli (2007). 
?'In the models I neutralized ATR by converting all [-ATR] vowels to [+ATR]. 
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8.2.2 A simple analysis of Kasem noun classes 


There are different takes on what the number markers in Kasem are. The ones 
I propose here are based on my own analysis of the system. Alternative models 
are of course possible, but should have little impact on the analogical system. As 
a guiding principle for my analysis, I tried to maximize morphology and min- 
imize phonology. Whenever there is enough evidence for a marker to be mor- 
phologically motivated, I rejected the phonological explanation for it. This is a 
conservative approach. In the worst case scenario, I am proposing more markers 
than there are in the system, which means that the analogical model will have 
a harder time to predict the classes. A smaller set of markers would result in a 
better model. 

Kasem has many different number markers, and some of these seem to be 
more clearcut than others. First, I will introduce the markers where there should 
be less room for an alternative analysis, and in the following subsection I will 
introduce those cases where different approaches are possible. This runs counter 
to the standard way of analyzing Kasem. Previous takes on Kasem have tried 
to minimize the number of exponents by way of using phonological rules and 
underlying representations based on some further assumptions. So, for example, 
de Haas (1987: 184) analyses the example in (13) as having a marker -i which 
coalesces with the underlying vowel in the stem and turns into /e/, instead of 
there being a marker -e. 


(13) a. /zwa * i/ — /zwe/ ‘ear’ 


b. /¿wa + i/ — /éwe/ ‘liver’ 


However, this approach relies on the assumption that ¿we and zwe belong to 
Gender 2 (Class B in the original) based on the agreement with the determiners, 
and that all nouns of Gender 2 have a singular marker -i. This would make sense 
if there were compelling evidence from some other morphological process that 
shows that the stem of these words ends with /a/. In a few cases like zwe, one 
can propose that compounds provide such evidence. The example in (14) shows 
/zwa/ as a stem in three noun-adjective compounds (these are all right-headed 
compounds, in that order: noun-adjective): 


(14) singular plural gloss 
a. zwa-boo zwa-boors ‘hole in the ear’ 
b. zwa-kogo zwa-kwaro ‘deaf person’ 


c. zwa-kwana zwa-kwana ‘earring’ 


164 


8.2 Cross-classifications between plural and singular: Kasem 


However, there is no such evidence for any of the other 52 nouns that end 
in /e/ in the singular in the dictionary, and there is even counter evidence for a 
general rule. In (15) we see what could be thought to be examples just as zwe, 
where a noun belongs to Gender 2, takes the singular marker -i and the plural 
marker -2, but because the stem ends in /5/, the /i/ surfaces as /e/. 


(15) singular plural gloss 
a. kalwe kalwa, kali ‘monkey’ 
b. kandwe kandwa “stone, rock’ 


However, compounds built from these nouns do seem to have a /a/ in the stem, 
as shown in (16) below. 


(16) singular plural gloss 
a. kalwe-faa kalwe-faaro ‘baboon’ 
b. kalwe-suja kalwe-sina “Red Patas Monkey’ 
c. kalwe-zwono  kalwe-zwom “Green Monkey’ 
d. kandwe-gara ` kandwa-gar ‘dike’ 
e. kandwe-nyunt kandwa-nyuna ‘bright / shiny stone’ 


What this means is that even if the phonological analysis is right in the case 


of zwe?! 


, we cannot automatically assume that this analysis applies to all nouns 
ending in /e/. A systematic study of each case would have to be undertaken, but 
because of the limitations of the dataset I am using, this is not feasible. For this 
reason, I will take markers to be what they appear to be in their surface form, 


unless there is clear and strong evidence to the contrary. 


8.2.2.1 Basic number markers 


An important feature of Kasem is that the same number markers can appear as 
singular markers in some nouns, and as plural markers in other nouns. The main 
markers (i.e. the most common ones) are: -e, -a, -i, -o, -u, -nə and -nu. We see in 
(17) examples of the -i marker in the plural, with the -a marker in the singular. 
In (18) we have the inverse situation. In both examples there is an assumption of 
coalescence between the /i/ in the stem and the /i/ in the marker (/i+i/ — /i/). In 
following sections I will discuss the possibility of an -i marker instead. 


“Even in this case it is unclear that this is the right analysis. It is not obvious that the form found 
in the compound is the stem, since the head noun of a compound can show some variation: 
tu-mwen ‘shrub, bush, small tree’ in the singular has the form twe-mwan in the plural. 
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(17) singular plural gloss 


a. afidia afid: ‘sugar cane’ 
b. bordia bordi ‘plantation’ 


(18) singular plural gloss 


a. bi bio ‘counter’ 
b. pomp:  pompia ‘water pump’ 


This is not the only possible analysis of these examples. One could also pos- 
tulate a zero marker for the singular and a -a marker in the plural. In this case 
the data are not enough to clearly distinguish between all the alternatives. I have 
tried to always take the most conservative approach.?? 

Examples in (19) and (20) show the alternation between the -e marker and the 
-a marker for both singular and plural. 


(19) singular plural gloss 


a. cicwe cicwa ‘spear’ 
b. nafozwe nafozwa ‘chapped fingers’ 


(20) singular plural gloss 


a. gungwona gungwe ‘hour-glass drum’ 
b. payaa paye ‘jaw’ 


The examples in (21) and (22) show the -o and -u markers. While the -o marker 
rarely appears in the plural (and then only with another -o marker in the singu- 
lar), the -u marker can be found both for plural and singular. 


(21) singular plural gloss 
a. bolo bwoolu, bwallu ‘valley, low land’ 
b. tasoro taswaaro ‘flint lighter, lighter’ 
(22) singular plural gloss 
a. yukolo  yukollo ‘skull’ 
b. yume  yuina ‘security guard, warden’ 
tiabu tiabia ‘cat’ 


22For purposes of the models, in these cases the stem was taken to be pomp or b, without an 
additional -i. 
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Finally, in (23) and (24) we see some examples of the -nu and -no markers. 
Both markers are almost exclusively found in the plural. The marker -nu always 
appears with lengthening of either the vowel or the consonant, and can only 
co-occur with either -7» or -nu in the singular, while the marker -nə can appear 
without lengthening in certain cases and is less restricted in terms of the singular 
markers it can combine with, although it tends to be pair with -m. 


(23) singular plural gloss 


a. dono daano ` ‘sticks to support a flat roof’ 
b. lulugu lulunnu ‘perspiration’ 


(24) singular plural gloss 
a. jazim jazina ‘right hand’ 
b. zuya zuna ‘bird’ 


These are the simple, straightforward number markers in Kasem. These exam- 
ples show that the language allows for reversals (Baerman 2007), where pairs of 
markers flip their value depending on the noun. This will be one important point 
in the analysis. 


8.2.2.2 The -y- and -g- markers 


I now turn to less straightforward cases. Many words show a // segment in the 
singular that does not appear in the plural. Sometimes this segment is the final 
segment in the word, but it is mostly followed by what appears to be a regular 
singular marker like those discussed above. For this reason it has been claimed 
that the /n/ is part of the singular stem, and that it tends to disappear in the plural 
(Callow 1965; Awedoba 1980). Thus, examples like those in (25) are analyzed as 
having an -2 marker in the singular and an -e marker in the plural. This, however, 
is no different from claiming that /n/ is a singular marker which alternates with 
other markers for the plural, with the caveat that it can then somewhat freely 
combine with additional singular markers. There does not seem to be anything 
special about these examples that make them different from others. 


(25) singular plural gloss 


a. wu-sarna WU-SE “second flute' 

b. baya-pwono baya-pwoonu ‘illness where the eyes, 
feet and hands are swollen' 

c. bugeni-zuga bugani-zuno ‘stork’ 
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Itisthen worth asking whether we are dealing with two co-occurring markers 
-n- and -ə (in a case of multiple exponence), or if there is an additional, indepen- 
dent marker -72. Looking more closely it becomes clear that -7- can appear with 
-a, -o and -u. Some examples are given in (26). These examples show that the 
marker -7V often alternates with -nu, but not necessarily, which is evidence that 
these are co-occurring markers. 


(26) singular plural gloss 
a. mung nyta,nyt ‘horn’ 
b. bwana bwe ‘adultery’ 
c. logo lwaans ‘distance, length, surface’ 
d. bulogo  bulwonnu ‘liana’ 
e. kunu kunnu ‘Bohor Reedbook’ 
f£ bono bonno ‘root’ 


An additional argument against the phonological analysis that states that /1/ 
is in the stem and gets deleted in the plural can be seen in (27), where an apparent 
-nV alternates with a -ņa marker, or an -i or -ia. Although it is hard to distinguish 
between both alternatives, /n/ is not simply deleted in the plural. 


(27) sc tito PL titona, tutwia ‘work, occupation’ 


The existence of the five examples in (28) makes things more complex, because 
here -7 appears as a marker on its own. As we will see later, there is a Ø marker 
in Kasem, which means this could be a case of -7-2, but also simply a -n final 
marker. 


(28) singular plural gloss 
a. don donna “mate, fellow, friend’ 
b. badon badonna ‘friend, colleague, comrade’ 
c. cilon cilonne, ciloona ‘friend’ 
d. ka-don ka-donnə ‘fellow wife’ 
e. yuudoy  yuudonno, yuudwoono ‘mate, friend of same age, 
comrade’ 


A similar marker to the -7- marker just discussed, is the -g- marker. Like -ņ- 
, this marker can also only appear with -a, -o and -u, and it exclusively marks 
singular. Some examples are given under (29). 
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(29) singular plural gloss 
a. gar-diga gar-di “mosquito net’ 
b. juga ju, je ‘place, location’ 
c. pogo pweru 'spider's web' 
d. sogo som, sunt ‘knife, razor’ 
e. kajugu  kajuru "head pad for carrying loads' 


The distribution of theses -gV markers with the corresponding plural markers 
is also not very restricted, particularly for -ga. Callow (1965) also claims that this 
marker is a stem phoneme that undergoes a phonological deletion process. 

The claim that y and g are part of the stem is not well argued for in the liter- 
ature. One argument in favour of this kind of analysis seems to be based on evi- 
dence from compounds like those in (30). The assumption is that singular mark- 
ers cannot appear inside compounds. 


(30) singular plural gloss 


a. zona zuno ‘bowl, calabash' 
b. zop-bio  zog-bi ‘calabash used for measuring’ 
zoy-dia  zon-di ‘calabash for eating food, eating bowl’ 


This kind of evidence is rather weak and not very systematic, however. For 
example, in cases like those in (31), the /g/ segment does not appear in the com- 
pounds of the noun, so one could just as well say that based on this evidence, -g- 
has to be a marker. 


(31) singular plural gloss 
a. digo di ‘hut, room, house’ 
b.  di-nia di-ni ‘married woman’s principal room’ 


C. di-yuu  di-yum ‘woman’s annex room, 
inner kitchen in the rainy season' 


Similarly, some compounds use the complete singular form of the noun, like 
those in (32). 


(32) singular plural gloss 


€ > 
a. sono swanno shea-nut tree 
b. sogo-sabara sono-sabari “tree species’ 


Thus, evidence from compounds to infer stems is contradictory. 
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Finally, whether we should consider -7- and -g-as independent markers or pos- 
tulate at least six -[+velar] V markers seems to be a secondary issue. As a middle 
ground, I posit a system where -7- and -g- can combine with other singular mark- 
ers, while being markers on their own. Unlike the -a, -o and -u markers -n- and 
-g- can combine with, -7- and -g- are (almost) exclusively singular markers. In 
the end, however, this will not make any difference for the analogical models. 


8.2.2.3 The -r- marker 


A similar situation arises in the plural with the -rV?% markers. The examples in 
(33) show the -r- marker, which almost exclusively appears in the plural (with 
the exception of the two words in (34)). We find -r- appearing mostly with -a 
and u, and only in a few cases with -o. Additionally, the -ru combination is found 
co-occurring with quite a few different singular markers. 


(33) singular ` plural gloss 
a. ba-dogo ` ba-doro 'sterile man' 
b. cibu-pogo cibu-pweru ‘chick of about one month’ 
c. dudu duduura ‘musical instrument’ 
d. tabulo taabuloro ‘black board’ 


The example in (34) shows that there are at least two apparent exceptions 
where -ru appears in the singular. It is hard to know how to interpret these cases. 
It could be that in fact -r- can appear in the singular but is dispreferred, or it 
could be that these are special cases that require some different kind of analysis. 


(34) singular plural gloss 


a. baro banna “husband, partner” 
b. kan-baro kan-banna ‘husband’ 


8.2.2.4 The -m marker 


A particularly hard case is found in the -Vm/-nV pairs, like those shown in (35). 


(35) singular plural gloss 
a. badam badono "bachelor 
b. bani-nyim banı-nyına ‘disrespectful person’ 
c. dom dona 'enemy' 


In earlier works it is common to find a reference to a marker du instead. This seems to be 
because /r/ and /d/ are allophones in the language. Since the source I am using uses /r/, I will 
use this notation. 
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There are several possible analyses for these examples. The more phonological 
one would suggest a sort of coalescence process between an /m/ segment of the 
stem and the -nV marker. Alternatively, one could argue that the fact that the 
sequence /mV/ is not found in singular forms suggests that the vowel is turning 
the /m/ into an /n/, and the fact that the final vowel of the singular is often kept 
in the plural strongly suggests that the stem ends in /m/, and these are examples 
of nouns without a singular marker. There are, however, several facts that speak 
against a phonological explanation. First of all, pairs like these can be found for 
the plural (with lower frequency, however): 


(36) sc balojana PL balejam ‘Buzzard’ 


If these were a purely phonological process, the symmetry would be a bit sus- 
picious. Particularly, cases like those in (37) are more in line with an -m marker, 
rather than an /m/ stem and coalescence. 


(37) singular plural gloss 


a. beesim beesa ‘torment, torture, oppression’ 


b. kadagum kadagwi ‘kind of sorghum’ 


Although one could postulate a /m/ deletion rule, this overly complicates what 
could be a straightforward system. This is even more clear from the perspective 
of the plural, especially cases with overabundance as those shown in (38). 


(38) singular plural gloss 
a. di-yuu  di-yum "woman's annex room, 
inner kitchen in the rainy season’ 
b. ga-sugu  ga-sum ‘wild Guinea fowl’ 
c. sono sam, sanı ‘house, compound’ 
d. sugu sum, suni ‘guinea-fowl’ 
e. sugu som, soni ‘knife, razor, cutlass’ 


These examples are strong evidence that this is not a phonological process, 
but rather a morphological one. I will thus consider -m to be a marker in its own 
right. 


8.2.2.5 The -iə marker 


This particular marker is even harder to argue for, particularly in the light of the 
-ə marker (discussed above). For most cases, it is not completely clear whether 
we are dealing with a -ia/-i class, or with a -a/-0 class, where either the plural or 
singular is expressed by a zero marker. In (39) we see a couple of examples: 
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(39) singular plural gloss 


a. manjısı manjisia ‘matches’ 
b. miamia miami ‘imported body creams/lotions' 


This is especially difficult in cases where the opposing marker is an -e, since 
one could just as well postulate a phonological rule which reduces /ie/ into /e/. 


(40) singular plural gloss 


a. kwer-dia kwor-de ‘loud voice’ 
b. kunku-bta kunku-be ‘soldier termite’ 


For both examples either analysis would work. The only clear evidence we 
have for an -ia marker comes from a few examples where nouns have a /ia/ in 
the plural and something else in the singular, or where we get a clearly different 
plural marker: 


(41) singular plural gloss 
a. dudwe dindwia ‘dream’ 
b. ga-digobu ga-digabia “African wild cat’ 
c. kabal-bu  kabal-bia ‘small soup-bowl for sauce’ 
d. nanio naniina “cow” 


I will assume an -ia marker, but acknowledge that there are many cases were 
it is not completely straightforward, from the dictionary alone, to determine 
whether we are actually dealing with a -ia marker or a -a marker. 


8.2.2.6 The -n marker 


Some examples like those in (42) show for both singular and plural what appears 
to be an -n marker. 


(42) singular plural gloss 
a. buga-nyvan buge-nywin ‘plant’ 
b. gwion gwin "Yellow-billed Shrike' 
c. bocwen bocwan ‘goat that has not yet given birth’ 
d. bu-kwion bu-kwuro ‘adolescent’ 
e. bana ben ‘bracelet, bangle, metal ring’ 


In this case one could, as before, postulate and additional series of -Vn mark- 
ers, or a -n marker which can co-occur with other singular and plural markers. 
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Since there does not appear to be evidence that could distinguish between either 
hypothesis, I will assume that this is again a case of multiple exponence, but the 
alternative should not have any impact on the implementation of the model. 


8.2.2.7 Three minor markers: the -iine, -si and Ø markers 


The final two segmental markers are the marker -iine, shown in (43), and the -si 
marker in (44). 


(43) singular plural gloss 


a. bar-nu bar-niina ‘mother-in-law’ 
b. fito-tu fita-tiina ‘mechanic, fitter’ 


(44) singular plural gloss 


a. do-baga do-bagsi ‘thunder’ 


b. ga-cawaka ga-cawags: ‘shrub species’ 


These two markers are infrequent and are not featured in the literature, but 
it seems unlikely that they could be analyzed as resulting from phonological 
processes. 

Finally, there is a Ø marker. This marker is rather rare, with only 15 examples, 
12 of which end in /[+velar]a/ in the singular. Of course, a no marker alternative 
works equally well and makes no real difference for the analysis. A phonological 
explanation could work for those cases where there is a final vowel (like in (45d)), 
in which one could postulate coalescence between the vowel in the stem and the 
marker, and thus we do not see any extra marker. But this explanation is much 
less likely for the examples with a consonant ending. 


(45) singular plural gloss 
a. kon koona ‘Roan Antelope, Kob’ 
b. kwan kwan ‘water-lily’ 
c. plan plaanro ‘plan, map’ 
d. mancıga mancı ‘manioc, cassava’ 
e. gar-digə gar-di ‘mosquito net’ 
f. bancıga bana ‘manioc, cassava’ 


In these examples it is clear that forms like maner or bancı have no plural 
marker because the singular contains them entirely, and adds some additional 
marker which does not otherwise combine, or follow a vocalic marker (i.e. -ga 
does not follow an -: marker). 
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8.2.2.8 Lengthening and diphthongization 


There are two phonological processes found in Kasem which seem to mark plu- 
rality in addition to the individual segmental markers presented before. These 
are: lengthening of the stem and diphthongization of the last vowel of the stem. 


(46) singular plural gloss 
a. logo lweru ‘hole dug for planting seed, seed-hole’ 
b. gww ywurs ‘wage, payment 
c. pulu pullu ‘granary made of straw’ 


In (46b) we see that the lengthening can be of the last vowel and in (46c) we 
see that it can be of the last consonant. This strongly speaks for a mora insertion 
which can either attach to the consonant or vowel. This analysis is supported by 
some overabundant examples where both effects are found. In (47) we see that 
this phenomenon is even independent of the additional segmental plural marker 
chosen. 


(47) singular plural gloss 
a. Coro corro, coors ‘black make-up’ 
b. voro vann, vaano ‘hoe’ 


Especially interesting are the cases where both processes (i.e. lengthening and 
diphthongization) occur on the same word as shown in (48). 


(48) singular plural gloss 
a. buga-kanyono  buge-kanywanno ‘kind of tree’ 
b. yolo ywollu 'empty area / field, 
empty space outside village’ 
c. colo cwaalo 'girl that likes going out 
with men' 
d. war-boro war-bwooru "brick mould / mold' 


8.2.2.9 Other stem changes 


Some nouns show some sort of unpredictable stem changes, mostly in velar seg- 
ments as seen in (49). 
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(49) singular plural gloss 
a. coro ceeni,ceenu ‘hen, fowl, chicken’ 
b. bono bannu ‘dung, shit’ 
c. biboku  bibagaru 'stutterer' 
d. ciko ciguo 'trap' 
e. cicogo — cikoro ‘feather of fowls’ 


I do not consider suppletion among the classes for the analogical model, but 
in principle this could also be a dimension of noun inflection. 


8.2.2.10 Compounds 


For most compounds, the only part that changes is the rightmost (the adjective). 
There are, however, exceptions with compounds with the word kandwe ‘stone’, 
among some others as in (50). 


(50) singular plural gloss 
a. kandwe-nyun kandwa-nyuna "bright / shiny stone' 
b. kandwe-gont kandwa-rona ‘precious / bright stone, 
jewels, pearl? 
c. kandwe-pisunt  kandwa-pisuna ‘pile / heap of stones’ 
d. kandwe-poloro  kandwa-palwaary ‘rock’ 
e. kandwe-poporo  kandwa-poporro “stone bracelet’ 
f  kunkwen-pogo kunkway-pwasnu  'Red-eyed Dove, 


collared dove’ 


I will leave this case as an open problem since the data are not conclusive as 
to why some compounds can inflect for their head noun and others do not. 


8.2.3 Materials 


The dataset, as well as all examples cited here, come from the Kasem Burkina 
Faso Dictionary (Niggli & Niggli 2007) in its online version.?* The dictionary 
lists for each noun its singular and plural forms, as well as the tones for the sin- 
gular form. The tones for the plural form are only listed in a few exceptional 
cases, which seems to suggest that the plural and singular forms have the same 
tones. This, however, is hard to extrapolate to words where the plural is longer 
or shorter than the singular. From 2000 nouns listed in the dictionary, I removed 


*4http://kassem-bf.webonary.org/, visited on 10-11-2016. 
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30 cases where either the marker was completely unclear, the plural showed un- 
predictable suppletion, or where there was reason to suspect an error (i.e. nouns 
where the ATR feature did not match across all their vowels, etc.), and ended up 
with a total of 1970 nouns. 

For the two nouns in (51) the dictionary presented an alternative in the singular. 
For both these cases I only considered the main form. 


(51) a. Kean (kwe) ‘Stripped Ground Squirrel’ 


b. se (swe) ‘ivory bracelet’ 
In the cases of polysemy I left all entries in the table: 


(52) a. ni opening of a room/house, gate’ 
b. ni ‘mouth, beak’ 


Co... 


As we have seen in multiple examples already, Kasem, just like Hausa, presents 
some overabundance in the plural forms: 


(53) singular plural gloss 


a. bwana  bwant,bwam ‘mosquito’ 
b. borgo boni bom ‘goat’ 


In all these cases I only considered the first plural listed. The reason is that the 
dictionary only lists 108 nouns with overabundant plurals. This is not enough to 
be able to reliably model overabundance in this case. 

For roughly half of the nouns, the dictionary included a semantic annotation 
which consists of some basic groupings like ‘animal’, ‘human’, ‘animate’, etc., 
coded with numbers. I use this semantic annotation in the analogical models. As 
for the nouns without semantic coding, I assigned them to a default class. 


8.2.4 Modelling the system 


After the previous discussion it is useful to look at the pairings between segmen- 
tal singular and plural markers. Table 8.18 shows the number of nouns for which 
a given pairing holds (ignoring overabundant cases), after neutralizing ATR. The 
table also ignores lengthening and diphthongization. Table 8.19 shows the co- 
occurrences of plural markers with either lengthening of the vowel (VV), the 
consonant (CC), and with the presence or absence of diphthongization. 
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Table 8.18: Co-occurrence of singular and plural markers 


Plural 


ru si 


ro 


no ni nu 


ne 


an 


en 


0 


Singular 


48 
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10 
0 
0 
1 
0 
0 
0 
2 
0 
0 
0 
0 
1 
0 
0 
0 
0 
0 137 
0 
0 

75 0 
0 


41 
42 


40 


236 
32 


321 
160 


3 
0 
0 
19 
0 
0 
0 
0 
0 
0 
0 
0 
8 
0 
1 
0 
0 
0 
0 
0 
1 
0 


i 
io 
in 
m 
no 
p 
e 
ri 
no 
nu 
no 
o 
on 
ru 
u 
un 
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Table 8.20: Co-occurrence of lengthening and diphthongization 


diphthongization no-diphthongization 


CC-lengthening 24 117 
no-lengthening 108 1092 
VV-lengthening 228 362 


If we cross-classify all factors the result are 144 nonempty classes (ignoring 
ATR), with most classes having less than 50 members, and 63 classes of only 1 
member. Because of this, a flat list of inflection classes looks particularly uncon- 
vincing. A more straightforward approach is to use cross-classification as with 
the Spanish systems. 

To model the complete space of inflectional classes several trees are required. 
The first thing we have to recognize is that markers like -i, are not in themselves 
plural or singular markers, but simply number markers. Whether they indicate 
plural or singular depends on their distribution with other markers. There are 
two alternatives at this point, either overspecification as in Figure 8.8, or under- 
specification as in Figure 8.9. 


number 
singular plural 
- -y- -e -ə di Ø -o -u -si -iine 


Figure 8.8: Kasem number markers with overspecification 


number 


ARA Y 


singular -e -ə -i Ø -o -u plural 


a ia 


-g- 0: -si -iine 


Figure 8.9: Kasem number markers with overspecification 


179 


8 Complex inflectional classes 


For the purposes of this study either alternative would work equally well. For 
simplicity I will go with the underspecification approach in Figure 8.9. 

Lengthening and diphthongization are processes which are completely inde- 
pendent of the segmental markers, but from Table 8.19 it should be clear that the 
distribution of plural markers is not random with regards to the classes they co- 
occur with. Both are much more likely with -u markers, and lengthening of the 
vowel is also very likely with -a. Similarly, we see that while -ru is very likely to 
co-occur with lengthening of the vowel, it only co-occurs once with lengthening 
of the consonant, as shown in (54). 


(54) sc ywam-pogo PL rwam-porro ‘scale of wound’ 


Similarly, as can be seen in Table 8.20, the proportion of words with no length- 
ening in the plural but diphthongization is around 10%, while that of CC-lenth- 
ening and diphthongization is around 20%, and the proportion of nouns with 
diphthongization and VV-lengthening is of almost 40%. These are clearly not 
random distributions?. What this means is that our model for cross-inheritance 
should consider all four factors: segmental markers of the singular, segmental 
markers of the plural, lengthening and diphthongization. 

Because lengthening and diphthongization only occur on the stem, these two 
dimensions can also be modelled with a stem space. For this, we have to postulate 
that Kasem nouns have a singular and a plural stem. Alternatively, nonconcate- 
native morphological processes could also be used to account for these changes. 
In the end, the important thing is that all nouns must be specified for whether 
they undergo these processes or not. The partial trees for lengthening and diph- 
thongization can be trivially defined as in Figure 8.10 and Figure 8.11. 


lengthening 


P dui 


no-length ^ length 


» um 


vowel | consonant 


Figure 8.10: Hierarchy for lengthening in Kasem 


251 skip statistical tests here because I will show this is the case with the models in the next 


section. 
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diphthongization 


Qe ey 


no-diphthong diphthong 


Figure 8.11: Hierarchy for dipthongization in Kasem 


Figure 8.12 shows a partial hierarchy with all dimensions of Kasem noun inflec- 
tion class. Segmental markers constitute a hierarchy of their own, which specifies 
which markers combine with which other markers. Underspecified markers can 
mark either singular or plural, and the combination of two of these underspeci- 
fied markers means that both alternatives are available". The complete inflection 
class of a noun is given by the sg-pl-diphth-length. 

Every noun in Kasem must be typed for its complete inflection class. In Fig- 
ure 8.12 the lexeme alapil ‘aeroplane’ belongs to class i-a—ndiphth—nl, which 
means it takes an i in the singular, a 2 in the plural, and its stem does not un- 
dergo diphthongization or lengthening. How different theories chose to realize 
these properties, is an independent problem. 


8.2.5 Methodological considerations 
8.2.5.1 Predictability between subtrees 


In several of the models below, when predicting a subtree (e.g. lengthening), I will 
include information from another subtree (e.g. diphthongization). From a theoret- 
ical perspective, this works in a different way than the stem information. Adding 
information about a cross-classifying tree is equivalent to removing a subset of 
the possible classes. In the toy example in Figure 8.13, two subtress, t and c, cross- 
classify to build the inflection classes for the lexemes w, to ws. If an analogical 
model predicting 7 for the words w, to wo, knows oc, it will not have to decide 
between three classes, but at most two. For words w; to ws, the type s2 uniquely 
determines that these words belong to type t3, because it removes the possibility 
that these words could belong to either t1 or t2. For words w, to we, the type s1 
removes the possibility of 13. 


*°Tt is however unclear if for all combinations of underspecified markers reversals are found. In 
other words, if x and y are underspecified, it is not clear whether x-y and y-x necessarily exist, 
or that it could exist. 
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class 


Figure 8.13: Example of cross-classifications and information 


8.2.5.2 Compounds 


We now turn to the analogical modelling. A difficult decision regarding this par- 
ticular dataset is whether to include compounds or not. Including them means 
that, because compounds usually have the same plural marker as the simplex 
noun, the model will be able to remember some cases. That is, the cross-validation 
is not completely perfect. On the other hand, not all compounds share the same 
plural marker as their simplex form. Additionally, it is not always clear what sort 
of compounds we are actually dealing with. Some seem semantically transparent 
like those in (55a) and (55b), but others less so like those in (55f) and (55g). 


(55) 


g. 


Con pp op 


singular 


bana 
kalum-baga 
nyasan-bio 
Zor-bio 
buno 

buno 
nwan-boryo 


plural 


ben 

kalum-be 
nyasar)-bio 
zon-bi 

boni, bom 
bonno 
nwan-bonno 


gloss 


‘bracelet, bangle, metal ring’ 
‘black bracelet (for rites)’ 
“sesame seeds’ 

‘calabash used for measuring’ 
‘goat’ 

‘root’ 

‘capillary’ 


Finally, not all words marked as compounds in the dictionary have a corre- 


sponding simplex form: 
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(56) singular plural gloss 
a.  kalog-jaro  kalon-jara ‘fisherman’ 
b.  wo-jaano wo-jaana ‘bird, insect’ 
c.  kamo-moro kamo-mora “potter” 
d. "or 
e. ^"jaano 
f. *moro 


There are only around 200 nouns which appear multiple times because they 
are present as simplex forms and compounds. One could still remove them from 
the dataset, considering the examples in (55) and (56), we see that compounds do 
not guarantee consistent plural endings, and do not guarantee a simplex forms. 
With this in mind, leaving the compounds in is not much too different from 
having items where the last three or four segments are identical. We would not 
remove these cases, since these are the core of what the analogical process is. 
Similarly, that compounds tend to belong to the same class as the simplex form, 
seems to also be a product of the same principles. Finally, from a more cogni- 
tive perspective, the fact that there are many lexical entries with the same stem 
simply means that there are more chances to memorize that form. In any case, it 
seems more realistic to leave the compounds in. 


8.2.6 Results 


The dataset extracted from the dictionary had 1970 nouns. Considering all these 
nouns, the total number of classes (disregarding lengthening and diphthongiza- 
tion) was 98, with 48 classes having one or two members. Although possible in 
theory, in practical terms it is very difficult to fit and evaluated models with this 
kind of distribution. On the one hand, it is impractical because there are just not 
enough training data for most classes, and on the other hand, errors in the very 
low frequency classes will unfairly penalize the model's performance. For this 
reason I removed all items that belong to a class with a type frequency of 8 or 
less. The final dataset contains a total of 1792 nouns, distributed across 33 classes. 
This leaves us with a system that has more classes than any of the other examples 
discussed in this book. 

The predictors are: the last three segments of the singular stem (computed as 
the singular without the singular marker), the semantic annotation in the dic- 
tionary, the lengthening process (C lengthening, V lengthening, or none), the 
diphthongization process (none or present), the singular marker and the plural 
marker. As mentioned above, because ATR is a stem feature, I neutralized it for 
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all stems. The length (in letters) of the stem and the tones of the singular form 
did not play any role in the models. 

Because of its complexity, I will present several different models that tackle 
different parts of the system. The following sections describe the results for each 
such model. I will only look at clustering of the results for the last model predict- 
ing inflectional class. There are many more possible combinations I did not test, 
but the most important aspects of the system are covered. 


8.2.6.1 Predicting diphthongization 


The first case we look at is diphthongization in the plural. Since it is a binary 
choice, this is the simplest of the models for Kasem. The basic model (not in- 
cluding number markers) was: diphthong ~ final.1 + final.2 + final.3 + 
meaning". Table 8.21 presents the results with the corresponding accuracy scores 
in Table 8.22. 


Table 8.21: Confusion matrix for the model predicting diphthongization 
without segmental number markers in Kasem 


Reference 
Prediction dp Ndp 
dp 267 66 
Ndp 79 1380 


Table 8.22: Accuracy scores for Table 8.21 


Overall Statistics 


Accuracy: 0.9191 

95% CI: (0.9055, 0.9313) 
No Information Rate: 0.8069 
Kappa: 0.7366 


Table 8.22 shows that the model has a very good accuracy and kappa scores to 
start with. This shows that diphthongization is highly predictable. Next we test 


For all Kasem models the networks only included a skip layer and no hidden layers, with a 
decay rate of 0.01. 
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to see whether adding both number markers helps the model. We refit the ana- 
logical model with the formula: diphthong - final.1 + final.2 + final.3 + 
lengthening + meaning + pl + sg. The results can be seen in Table 8.23, and 
the corresponding accuracy values in Table 8.24. 


Table 8.23: Confusion matrix for the model predicting diphthongiza- 
tion with segmental number markers in Kasem 


Reference 
Prediction dp Ndp 
dp 303 46 
Ndp 43 1400 


Table 8.24: Accuracy scores for Table 8.23 


Overall Statistics 


Accuracy: 0.9503 

95% CI: (0.9392, 0.9599) 
No Information Rate: 0.8069 
Kappa: 0.8411 


The overall evaluation is shown in Figure 8.14. There are several important 
observations. First of all, lengthening and meaning do not seem to play any role 
in the model when the other factors are considered. The final segment of the 
stem was the most predictive segment, and remained relevant even after adding 
both number markers. The other two segments seem to be somewhat redundant 
with the number markers, even though they played a role on their own. This is to 
be expected if there is a strong correlation between final segments and number 
markers. However, the fact that the final.1 was highly predictive even after 
adding the number marker, means that it is contributing to the analogical model 
independently of its predictive power of the segmental number markers. Finally, 
the singular marker was more predictive than the plural marker. This will be a 
recurring theme in this section: it is easier to predict plural markers (including 
lengthening and diphthongization) from the singular markers, than from other 
plural markers, and the other way around. There is no obvious explanation for 
this phenomenon. A possible reason is that the task of predicting a given plural 
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marker usually follows from knowing the singular, and not from knowing other 
co-occurring plural markers. 


e accuracy m kappa 
additive subtractive 
H * Poyo pas 2 co doo ooo po alo o Sad 


+ 


0.75 - D 


0.00 - 


baseline - 
final.1 

final.2 - 
final.3 

lengthening - 
pl 

sg- 

meaning - 
baseline 
final.1 

final.2 - 
final.3 
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pl 

sg- 

meaning - 


factor 


Figure 8.14: Additive (left) and subtractive (right) accuracy and kappa 
scores for for the model predicting diphthongization with segmental 
number markers in Kasem 


8.2.6.2 Predicting lengthening 


The second feature in degree of complexity is the lengthening (or mora inser- 
tion) in the plural. In this case we are dealing with a three way choice: no length- 
ening (NC), consonant lengthening (CC) and vowel lengthening (VV). The best 
model (not including segmental number markers) was: lengthening ~ final.1 
+ final.2 + final.3. The results of this model can be seen in Table 8.25 and the 
corresponding statistics in Table 8.26. 

This model is, once more, already quite good. The type of lengthening a stem 
undergoes is highly predictable from its shape alone. In this case the semantics 
did not play any role. Next, we fit a model that includes all other number classes 
as predictors lengthening ~ final.1 + final.2 + final.3 + diphthong + 
pl + sg. Results for this model can be seen in Table 8.27 and the corresponding 
statistics in Table 8.28. 

The overall evaluation is shown in Figure 8.15. This table presents a more dra- 
matic increase in both kappa and accuracy after adding the segmental number 
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Table 8.25: Confusion matrix for the model predicting lengthening 
without segmental number markers in Kasem 


Reference 
Prediction CC NL VV 
CC 49 35 5 
NL 58 979 137 
VV 18 100 411 


markers. In this case both the singular and plural segmental markers had a very 
similar importance. More interesting, however, is the fact that in this case we 
see the opposite effect in the final three segments of the stem. In the previous 
case of predicting diphthongization, only the final segment was independently 
predictive of the outcome, here the penultimate and antepenultimate segments 
are both independently predictive of the lengthening. This again goes to show 
that different subtrees in the hierarchy have their own analogical relations for 
their members. Finally, it is worth noting that when predicting diphthongization 
there was no effect from adding lengthening as a predictor, and here there is 
no effect from adding diphthong as a predictor. What this suggests is that the 
correlations described before are already being captured by the final segments. 
This is the first indication that there is heavy redundancy in the system. I will 
come back to this in the following sections. 


8.2.6.3 Predicting singular markers 


We now turn to predicting the singular marker of a word. Because I will be 
discussing many different models of related phenomena it would be tedious to 
present confusion matrices or heat maps for each of them. For this reason, I will 
only present the basic accuracy measures for model comparison. In the last sec- 
tion I will present the heat maps of the final models. 

In the first model we are looking at the bare effects of the final segments and 
meaning of the stems: singular ~ final.1 + final.2 + final.3 + meaning. 
This model tries to predict total of 14 different markers: e, ia, i, u, a, 0, gu, NO, m, 
na, go, ga, na, nu. The accuracy scores are shown in Table 8.29. 

This model shows very good performance, especially considering the relatively 
large number of classes it is predicting. This works as the initial baseline of com- 
parison. The next step is to include the plural marker as a predictor: singular 
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Table 8.26: Accuracy scores for Table 8.25 


Overall Statistics 


Accuracy : 0.803 
95% CI : (0.7838, 0.8212) 
No Information Rate : 0.6217 
Kappa : 0.6046 


Statistics by Class: 
Class: CC Class: NL. Class: VV 
Sensitivity 0.392 0.879 0.743 
Specificity 0.976 0.712 0.905 
Neg Pred Value — 0.955 0.782 0.888 
Balanced Accuracy 0.684 0.796 0.824 


Table 8.27: Confusion matrix for the model predicting lengthening 
without segmental number markers in Kasem 


Reference 
Prediction CC NL VV 
CC 103 7 11 
NL 4 1076 33 
VV 18 31 509 


Table 8.28: Accuracy scores for Table 8.27 


Overall Statistics 


Accuracy : 0.942 
95% CI : (0.9301, 0.9523) 
No Information Rate : 0.6217 
Kappa : 0.8869 


Statistics by Class: 
Class: CC Class: NL. Class: VV 
Sensitivity 0.824 0.966 0.920 
Specificity 0.989 0.945 0.961 
Neg Pred Value 0.987 0.944 0.964 
Balanced Accuracy ` 0.907 0.956 0.940 
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Figure 8.15: Additive (left) and subtractive (right) accuracy and kappa 
scores for for the model predicting diphthongization with segmental 
number markers in Kasem 


Table 8.29: Accuracy scores for the model predicting the singular 
marker from the stem information only 


Overall Statistics 


Accuracy: 0.5709 

95% CI: (0.5476, 0.5939) 
No Information Rate: 0.2037 
Kappa: 0.5003 
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~ final.1 + final.2 + final.3 + meaning + DUT. The accuracy scores are in 
Table 8.30. 


Table 8.30: Accuracy scores for the model predicting the singular 
marker from the stem and plural marker information 


Overall Statistics 


Accuracy: 0.8186 

95% CI: (0.8, 0.8362) 
No Information Rate: 0.2037 
Kappa: 0.7889 


The results in Table 8.30 show that there is a considerable gain from including 
the plural marker in the model. For comparison, using only the plural marker: 
singular ~ pl produces the results in Table 8.31. 


Table 8.31: Accuracy scores for the model predicting the singular 
marker from the plural marker information only 


Overall Statistics 


Accuracy: 0.6077 

95% CI: (0.5847, 0.6304) 
No Information Rate: 0.2037 
Kappa: 0.5348 


It should then be clear that although the effect of knowing the plural marker 
is considerable, it is even better when the model knows the shape of the singular 
stem. The overall results are shown in Figure 8.16, and the heat map for the model 
using only stem information is in Figure 8.17. 


8.2.6.4 Predicting plural markers 


We now try to predict the plural marker of a noun. In this case the predicted 
classes are: a, i, ru, u, io, nu, e, no, m, 0, si, en, iino, in. We first look at the basic 
model with only the final segments and meaning of the stem: plural ~ final.1 
+ final.2 + final.3 + meaning. The accuracy results are in Table 8.32. 


*8The reason for not using the plural stem in these cases is that the plural stem follows directly 
from knowing the singular stem plus the dimensions of diphthongization and lengthening. 
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Figure 8.16: Additive (left) and subtractive (right) accuracy and kappa 
scores for for the model predicting singular from the singular from 
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Figure 8.17: Heat map for the models predicting the singular marker 
from the stem information only 
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Table 8.32: Accuracy scores for the model predicting the plural marker 
from the stem information only 


Overall Statistics 


Accuracy: 0.6345 

959; CI: (0.6117, 0.6568) 
No Information Rate: 0.3265 
Kappa: 0.5528 


Next, we test the effect of adding the singular marker: plural ~ final.1 + 
final.2 + final.3 + meaning + sg. The results of this model are in Table 8.33. 


Table 8.33: Accuracy scores for the model predicting the plural marker 
from the stem and singular marker 


Overall Statistics 


Accuracy: 0.8867 

95% CI: (0.8711, 0.901) 
No Information Rate: 0.3265 
Kappa: 0.8615 


Table 8.33 shows that the plural marker is more predictable than the singular 
marker. A possible simple explanation is that it is more common that one would 
want to predict the plural ofa noun from knowing its singular form, than wanting 
to predict the singular form of a noun from knowing its plural. A very similar 
situation arises if we try to predict the plural marker from the singular marker 
alone: plural ~ sg. The results are in Table 8.34. 

These results show a greater symmetry in the implicational relations. The over- 
all results and evaluation can be seen in Figure 8.18, and the heat map for the 
model using only the stem is in Figure 8.19. 


8.2.6.5 Predicting class 


Finally, we want to put these things together and predict inflectional class (de- 
fined as the combination of a singular and a plural marker). So far I did not in- 
clude diphthongization and lengthening as part of the inflectional class. Doing 
so would result in too many labels, which the model would have a very hard time 
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Table 8.34: Accuracy scores for the model predicting the plural marker 
from the singular marker information only 


Overall Statistics 


Accuracy: 0.7204 

95% CI: (0.699, 0.7411) 
No Information Rate: 0.3265 
Kappa: 0.6468 
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Figure 8.18: Additive (left) and subtractive (right) accuracy and kappa 
scores for for the model predicting plural from the singular stem in 
Kasem 
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8.2 Cross-classifications between plural and singular: Kasem 
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Figure 8.19: Heat map for the models predicting the plural marker from 
the stem information only 


predicting. Additionally, as seen when predicting diphthongization and length- 
ening, both these sub-trees are fairly predictable from the same factors?”. I will 
instead use both factors (diphthongization and lengthening) as predictors of class. 
As before, there is no real limit to possible combinations of factors and classes 
one can test. 

First we predict from the stem with a basic model that only looks at the ending 
and meaning of the stem: class ~ final.1 + final.2 + final.3 + meaning. 
The results are in Table 8.35 and its corresponding heat map in Figure 8.20 

Including Lengthening and diphthong as predictors with the formula: class - 
final.1 + final.2 + final.3 + lengthening + diphthong + meaning, pro- 
duces a clear improvement. The results can be seen in Table 8.36, the corre- 
sponding heat map can be seen in Figure 8.21, and the overall evaluation in 
Figure 8.22. 

In this case it is also useful to look at the balanced by-class accuracy of the 
model. That is, we can look at how each level of the response variable (each in- 


This has the additional problem that it burdens the analogical model, since the factors will be 
doing multiple jobs at the same time. 
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Prediction 
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Table 8.35: Accuracy scores for the model predicting inflection class 
from the stem only 


Overall Statistics 


Accuracy: 0.5335 

95% CI: (0.5101, 0.5568) 
No Information Rate: 0.1791 
Kappa: 0.4928 


u-ru Pa ra m ww 
a Ka 


Figure 8.20: Heat maps for the models predicting inflection from the 
stem only 


Table 8.36: Accuracy scores for the model predicting the plural marker 
from the singular marker information only 


Overall Statistics 


Accuracy: 0.6596 

95% CI: (0.6371, 0.6815) 
No Information Rate: 0.1791 
Kappa: 0.6303 


Prediction 


value 


8.2 Cross-classifications between plural and singular: Kasem 
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Figure 8.21: Heat maps for the models predicting inflection class from 
the stem, and lengthening and diphthongization information 
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Figure 8.22: Additive (left) and subtractive (right) accuracy and kappa 
scores for for the model predicting inflection class 
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flectional class) increases or decreases in accuracy as we add or subtract factors. 
These results are shown in Figure 8.23. The interesting point here is that differ- 
ent classes are not equally predictable. What this means is that there is not an 
homogeneous increase in the class accuracy. Instead, some classes like o-u or e-2 
achieve a very high balanced accuracy with the use of just one predictor, while 
classes like 2-2 and ia-e remain quite unpredictable all the way through. This in- 
dicates that class predictability is not symmetric, and that different classes focus 
on different parts of the stem. 

Finally, the clustering created by this model? presents several crucial results. 
Like in Spanish, this is the most interesting aspect of the models. The first thing 
we can observe is that the larger (color coded) clusters are not homogenous with 
respect to the features that seem to define them. There are several important 
clusters to look at here. On the left top corner, in dark green, we find an inversion 
-i/-a — -a/-i, next to -u/-a which fits the general pattern of an -ə with a high vowel. 
To the right, and around the -0.5 X axis, we find three classes: -a/-a, -i/-i and -u/-u. 
The first two are close to each other and clustered together, while the last class 
is clustered separate from the other two, but it is placed quite close to them on 
the map. 

Close, and tightly grouped together, we find two clusters, one in dark blue and 
one in light lilac. These two clusters all share an -ia marker, except for one which 
only has a -a marker. In dark blue we see an inversion between -ia/-i and -i/-ia, 
and in light lilac a partial inversion of -ia marking singular and plural. The next 
color clusters are less well organized from a perspective of a potential hierarchy, 
but from their position they make sense. On the lower right corner we see three 
classes that share an -o in the singular and -u in the plural, with some additional 
-g-, -r- and -n-. Right at the 0.5 X and -0.25 Y we find other two classes with a -ru 
marking plural (again, close to the -o/-ru and -go/-ru classes). 

Right at the center of the map we see three classes: -u/-iina, -m/-a and -a/-si. 
These classes only share the -a marker (or /a/ segment in the case of -inna), but 
they have in common that they have one marker not shared by any other class. 
Atthe same X coordinate, but at around 0.5 Y, we have two close classes having a 
-[+velar]a marker for the singular and -iin the plural, and not too far off we have 
the very similar -na/-in class (arguably the class -ga/-@ is also related to these 
three classes). A class that seems somewhat out of place is the -gu/-ru class, also 
in dark orange. Finally, in the top right corner we have two groups. In light blue 
we have classes with -a/-e plus additional markers, and in dark lilac we have the 
inversion -na/-m — -m/-na. 


30 As before, we fit a direct similarity model instead of relying on the errors of the analogical 
model. 
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A second important result that can be observe in this clustering is that the 
presence or absence of p, n, r, s and m markers is not random on the map. All 
these markers only appear with positive values on the X axis. Similarly, most 
velar markers are in the upper right quadrant. What this indicates is that these 
markers cluster independently of the vocalic markers, lending some evidence to 
the hypothesis that each subtree in the hierarchy has its own analogical function. 

Important for the sketch of the system presented above is that for most classes 
their position on the plane depends more on the vowel presence or combinations, 
than on what they mark. That is, -x/-y classes are close to other classes with 
either -x or -y present, independently of whether -x and -y are marking the same 
number. This is exactly what the hierarchy suggested would predict. 

Finally, because of the complexity of the system, we can test whether there 
are extra similarity dimensions we are missing in this MDS plot. To do this, we 
extract three main components of the similarity matrix instead of two, and plot 
them side by side. This is similar to looking at a cube from three of its faces. In 
the plots in Figure 8.25, X is the first component, Y the second and Z the third. 

The XY plot shows the same map as before for comparison. The most interest- 
ing effect is found in the ZY plot. Here a strong grouping of the classes across 
vocalic lines appears. Classes with /o/ and /u/ are mostly on the lower quad- 
rants, and classes with /a/ and /i/ tend to be higher. Particularly interesting is 
the repositioning of -a/-i to the right quadrant, closer to other classes with the 
same sequence of vocalic markers. The XZ plot is less interesting, but it shows 
a much stronger separation of the purely vocalic class from classes with multi- 
ple exponents. Although the evidence is somewhat weaker, we see that different 
similarity dimensions capture what seems to be different aspects of the hierar- 
chy. 

What this decomposition shows is that the grouping effects between the 
classes go beyond two dimensions. That is, our two dimensional representation 
of class similarity can only capture a portion of the relevant information. This 
makes sense from a cross-classification perspective. Two classes might be sim- 
ilar to each other along some dimension, but different from each other along 
some other dimension. The MDS diagrams are only approximations of the actual 
similarity effects between classes. 


8.3 Interim Conclusion 


In this chapter I looked at two complex inflectional systems: Spanish verb inflec- 
tion and Kasem singular-plural classes. In Spanish, verbs are divided into three 
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main inflection classes: -ar, -er and -ir verbs. Additionally, a set of verbs show 
different kinds of vocalic and consonant stem alternations in the present tense 
and past participle. Analogical models trained on the phonological shape of the 
stems could predict with high accuracy the main inflection class of verbs, and 
the stem alternation that verbs exhibit. The clustering based on stem similarity 
showed that verbs that undergo the same stem alternation have similar stems, 
even if they belong to different main inflection classes. 

I propose that these facts taken together constitute very strong evidence that 
the analogical relations do not only choose one of the trees in the hierarchy, but 
go up all of them. Naturally, this does not mean that we should always see perfect 
correlations, but rather that the correlations between the analogical relations and 
the grammatical hierarchy will be present. 

In Kasem, nouns can take a variety of different singular and plural markers. A 
key feature of this system is that individual markers can denote singular and plu- 
ral in different nouns. In addition to this, nouns can undergo diphthongization 
and vowel lengthening in the plural. These three dimensions (markers, length- 
ening and diphthongization) produce the inflection class of nouns. The analogi- 
cal models, trained on the phonological shape and meaning of the stems, could 
correctly distinguish these three dimensions, and predict with a high degree of 
accuracy the inflection class of nouns. The models showed that inflection class 
is almost equally predictable from the stem as it is from the singular or plural 
marker alone. 
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The clustering analysis in Kasem showed that inflection classes that shared 
the same markers clustered together, even if markers were flipped, i.e. marking 
singular in one class and plural in the other class. This means that the analogical 
relations must also hold at a more abstract level, and not just on the leaves of the 
hierarchy. This is because if nouns of classes 2-i and u—a (as many other cases 
discussed above) are similar to each other, it means that at some level both classes 
must share a general type 2 underspecified for number. 

Overall, this chapter shows that the kind of analogical classifiers proposed 
in this book can model very complex systems with many classes. It also shows 
analogical relations still reveal aspects of the hierarchy, even if said hierarchy 
includes very complex interactions of multiple dimensions. 
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9.1 The path forward 


Although all fundamentals of the relation between analogy and formal gram- 
mar were covered, some relevant related topics still need to be considered. These 
would require discussions of their own. In this section I will briefly delineate 
them. 


9.1.1 The limits of analogy 


The approach to analogical classifiers presented in this book does not only apply 
to complex systems. They also apply to systems such as the Korean nominative 
marker, where nouns ending in a consonant take -i and nouns ending in a vowel 
take -ka. Since these simple cases can be modelled without the need for inheri- 
tance hierarchies or complex analogical systems, they raise the question of where 
the limits of analogy lie. Once analogical classifiers are in place in a grammar, it 
becomes easy to analyze these alternations as inflection classes. This, however, 
does not mean that analogical classifiers are necessarily always the right answer. 

This is a topic that needs further work. It requires a good theoretical footing 
and techniques that would allow us to evaluate what kind of approach is bet- 
ter suited for a given case. Analogical models can be compared in terms of their 
accuracy and coverage, but it is hard to compare analogical models to their alter- 
natives in these terms. 


9.1.2 Analogical classifiers or proportional analogies 


A similar and related question which would deserve a detailed treatment is the 
comparison of analogical classifiers and models of proportional analogies. As 
I discussed in Chapter 2, both analogical classifiers and proportional analogy 
models share some core assumptions but also diverge in some key properties. 
While analogical classifiers require an abstraction step which links lexemes to 
classes and classes to forms, proportional analogy models can directly link forms 
to forms. This makes proportional analogy a conceptually simpler system, but it 
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is not completely clear that it can correctly handle all cases that analogical clas- 
sifiers can deal with. A thorough comparison of both approaches with relation 
to complex and typologically diverse phenomena is still needed. 


9.1.3 The features of analogy 


Probably the most intriguing question left unanswered, is the one about the re- 
lation between the nature of the morphological process and the position of the 
analogical relations (Chapter 7). From a purely theoretical perspective, there is 
no reason why analogy should care more about the final or initial segments of 
a word than the mid segments. Analogical models could be stronger in starting 
from the second phoneme or only take into account a subset of phonemes. Or, 
even more basic, it is unclear why analogy does not seem to take into account 
complete stems but only focuses on portions of stems. I have suggested that this 
is likely related to learning and usage. Tracking similarities in complete stems 
requires more effort than tracking similarities for only the edges of stems, which 
means that speakers might only track as much as they need and no more. But 
this is only a conjecture, and proper theoretical, computational and experimental 
work needs to address this question. 


9.1.4 Coverage 


Finally, a question I mostly ignored is that of coverage. None of the models 
reached 1007; accuracy, but speakers of the languages studied show very little 
uncertainty regarding the choices they have to make. It is not often the case that 
a Spanish speaker is uncertain about the conjugation of some verb (although 
from personal experience it does happen), or that a Russian speaker does not 
know what the diminutive of a noun should be (Gouskova et al. 2015). What this 
means is that analogical classifiers are much more precise than what we saw 
in this book. One of the reasons for the low accuracy in many cases was that 
the models had much less information than what an actual speaker would have. 
Spanish speakers do not just encounter verb stems but rather whole inflected 
forms, and they often see more than one of the stems of any verb. An impor- 
tant question still missing an answer is how accurate the analogical classifiers of 
speakers actually are, and how much information about inflection class is really 
contained in stems and in fully inflected forms. Similarly, we do not know how 
much speakers actively rely on analogical relations found in the system, and how 
much of it are just leftovers from historical processes. 
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9.2 Final considerations 


The main proposal of this book is that the analogical relations responsible for 
class assignment operate on the hierarchies that define those same classes. 

I have shown that analogy as predictor of class membership is not solely re- 
stricted to one domain, or to just one language family, but can be found in gen- 
der assignment, number and case inflection, as well as verb conjugation classes, 
and derivational affix competition. I have looked at Romance, Germanic, Slavic, 
Oto-Manguean, Chadic, and Bantu languages. I have shown that the analogical 
approach I propose here generalizes well to a wide range of phenomena and lan- 
guages. 

Chapter 5 presented two cases of interactions between gender and inflection 
class taken from Latin and Romanian. I proposed that using cross-classification 
in the hierarchy between gender and inflection class could easily capture these 
interactions, and showed that the analogical models closely reflected these hier- 
archies. 

In Chapter 6, I explored overabundance and derivational doubletisms. In these 
cases, there are two mutually exclusive markers/suffixes which express the same 
meaning. A set of lexemes can only combine with one of the two, while a second 
set of lexemes can combine with several. The Croatian example illustrated this 
with the markers for the instrumental singular, which can be -em or -om. In Rus- 
sian, I explored the alternation between the three diminutive suffixes -ik, -chik, 
and -ok. 

Chapter 7 looked at a different aspect of analogical models. In this chapterI pre- 
sented evidence for the claim that the nature of the morphological processes at 
play has an impact on the kinds of features that the analogical relations take into 
account. Swahili and Otomi use prefixes to mark inflection of nouns and verbs, 
respectively. In both cases, the initial segments of the stem were more important 
for the analogical model than the final segments. In Hausa, plural formation in- 
cludes broken plurals which keep the last consonant of the singular form of the 
noun but change the penultimate and final vowels. In this case, the analogical 
model found the vowels of the singular were the most relevant predictors. 

Finally, Chapter 8 explored two cases where inflection of verbs (Spanish) and 
nouns (Kasem) comprise several independent levels. In Spanish, verbs can belong 
to three main inflection classes but also undergo several different stem changing 
processes. In Kasem, nouns can belong to one of many different inflection classes 
(understood as the combination of singular and plural markers), and also un- 
dergo lengthening and diphthongization. To capture these different dimensions 
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of inflection, I proposed hierarchies where individual processes are captured by 
independent subtrees but come together to form the complete inflection classes. 
The analogical models fitted to these cases showed a strong correlation with the 
proposed hierarchies and also showed a certain degree of organization along the 
different subtrees. 

In this book I have presented a way of understanding analogy as a type con- 
straint (ATC). This model consists of two basic building blocks: a type hierarchy 
and individual analogical constraints. The type hierarchy captures all common 
properties between inflection or derivation classes, and organizes the individual 
lexemes according to their morphological behavior. The analogical constraints 
operate on a type by type basis, specifying the phonological and semantic prop- 
erties lexemes that belong to a certain type must fulfill. The innovative key aspect 
of this model is that analogical constraints work on a binary basis, and that all 
types, both concrete and abstract, can impose analogical constraints. 

Given a hierarchy of classes for some inflectional or derivational system, for 
every class in the hierarchy, a series of analogical constraints determine what 
phonological and semantic features items belonging to that class must satisfy. 
This model allows for a straightforward integration of analogy into the grammar 
while keeping them distinct and modular. The Arc model makes the prediction 
that analogical relations will show reflexes of the hierarchy. In Part II, I presented 
evidence from six case studies that support this claim. These case studies showed 
that the structure of the hierarchy clearly has reflexes on the analogical relations. 
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Analogical classification in formal grammar 


The organization of the lexicon, and especially the relations between groups of lexemes 
is a strongly debated topic in linguistics. Some authors have insisted on the lack of any 
structure of the lexicon. In this vein, Di Sciullo & Williams (1987: 3) claim that “[t]he 
lexicon is like a prison — it contains only the lawless, and the only thing that its inmates 
have in commonis lawlessness”. In the alternative view, the lexicon is assumed to have 
a rich structure that captures all regularities and partial regularities that exist between 
lexical entries.Two very different schools of linguistics have insisted on the organization 
of the lexicon. 

On the one hand, for theories like HPSG (Pollard & Sag 1994), but also some versions 
of construction grammar (Fillmore & Kay 1995), the lexicon is assumed to have a very 
rich structure which captures common grammatical properties between its members. In 
this approach, a type hierarchy organizes the lexicon according to common properties 
between items. For example, Koenig (1999: 4, among others), working from an HPSG 
perspective, claims that the lexicon "provides a unified model for partial regularties, 
medium-size generalizations, and truly productive processes". 

On the other hand, from the perspective of usage-based linguistics, several authors 
have drawn attention to the fact that lexemes which share morphological or syntactic 
properties, tend to be organized in clusters of surface (phonological or semantic) simi- 
larity (Bybee & Slobin 1982; Skousen 1989; Eddington 1996). This approach, often called 
analogical, has developed highly accurate computational and non-computational models 
that can predict the classes to which lexemes belong. Like the organization of lexemes 
in type hierarchies, analogical relations between items help speakers to make sense of 
intricate systems, and reduce apparent complexity (Kópcke & Zubin 1984). 

Despite this core commonality, and despite the fact that most linguists seem to agree 
that analogy plays an important role in language, there has been remarkably little work 
on bringing together these two approaches. Formal grammar traditions have been very 
successful in capturing grammatical behaviour, but, in the process, have downplayed the 
role analogy plays in linguistics (Anderson 2015). In this work, I aim to change this state 
of affairs. First, by providing an explicit formalization of how analogy interacts with 
grammar, and second, by showing that analogical effects and relations closely mirror 
the structures in the lexicon. I will show that both formal grammar approaches, and 
usage-based analogical models, capture mutually compatible relations in the lexicon. 


